Software performance testing

In software quality assurance, performance testing is in general a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.
Performance testing, a subset of performance engineering, is a computer science practice which strives to build performance standards into the implementation, design and architecture of a system.

Testing types

Tests examining the behavior under load are categorized into six basic types: Baseline test, load test, stress test, soak test, smoke test or isolation test. Additionally to these basic types, configuration testing and Internet testing can be done.

Baseline testing

Baseline testing is used to create a comparison point for other types of tests, e.g., for a stress test. By measuring how the system reacts in a "best case", for example only 5 parallel users, the other test types can be used to compare how the performance degrades in the worst case.

Load testing

Load testing is the simplest form of performance testing. A load test is usually conducted to understand the behavior of the system under a specific expected load. This load can be the expected concurrent number of users on the application performing a specific number of transactions within the set duration. This test will give out the response times of all the important business critical transactions. The database, application server, etc. are also monitored during the test, this will assist in identifying bottlenecks in the application software and the hardware that the software is installed on.

Stress testing

Stress testing is normally used to understand the upper limits of capacity within the system. This kind of test is done to determine the system's robustness in terms of extreme load and helps application administrators to determine if the system will perform sufficiently if the current load goes well above the expected maximum.
Spike testing is a special form of stress testing, and is done by suddenly increasing or decreasing the load generated by a very large number of users, and observing the behavior of the system. The goal is to determine whether performance will suffer, the system will fail, or it will be able to handle dramatic changes in load.
Breakpoint testing is also a form of stress testing. An incremental load is applied over time while the system is monitored for predetermined failure conditions. Breakpoint testing is sometimes referred to as Capacity Testing because it can be said to determine the maximum capacity below which the system will perform to its required specifications or Service Level Agreements. The results of breakpoint analysis applied to a fixed environment can be used to determine the optimal scaling strategy in terms of required hardware or conditions that should trigger scaling-out events in a cloud environment.

Soak testing

Soak testing, also known as endurance testing or stability testing, is usually done to determine if the system can sustain the continuous expected load. During soak tests, memory utilization is monitored to detect potential leaks. Also important, but often overlooked is performance degradation, i.e. to ensure that the throughput and/or response times after some long period of sustained activity are as good as or better than at the beginning of the test. It essentially involves applying a significant load to a system for an extended, significant period of time. The goal is to discover how the system behaves under sustained use.

Isolation testing

Isolation testing is not unique to performance testing but involves repeating a test execution that resulted in a system problem. Such testing can often isolate and confirm the fault domain.

Configuration testing

Rather than testing for performance from a load perspective, tests are created to determine the effects of configuration changes to the system's components on the system's performance and behavior. A common example would be experimenting with different methods of load-balancing.

Internet testing

This is a relatively new form of performance testing when global applications such as Facebook, Google and Wikipedia, are performance tested from load generators that are placed on the actual target continent whether physical machines or cloud VMs. These tests usually requires an immense amount of preparation and monitoring to be executed successfully.

Setting performance goals

Performance testing can serve different purposes:

It can demonstrate that the system meets performance criteria.
It can compare two systems to find which performs better.
It can measure which parts of the system or workload cause the system to perform badly.

Many performance tests are undertaken without setting sufficiently realistic, goal-oriented performance goals. The first question from a business perspective should always be, "why are we performance-testing?". These considerations are part of the business case of the testing. Performance goals will differ depending on the system's technology and purpose, but should always include some of the following:

Concurrency and throughput

If a system identifies end-users by some form of log-in procedure then a concurrency goal is highly desirable. By definition this is the largest number of concurrent system users that the system is expected to support at any given moment. The work-flow of a scripted transaction may impact true concurrency especially if the iterative part contains the log-in and log-out activity.
If the system has no concept of end-users, then performance goal is likely to be based on a maximum throughput or transaction rate.

Server response time

This refers to the time taken for one system node to respond to the request of another. A simple example would be a HTTP 'GET' request from browser client to web server. In terms of response time this is what all load testing tools actually measure. It may be relevant to set server response time goals between all nodes of the system.

Render response time

Load-testing tools have difficulty measuring render-response time, since they generally have no concept of what happens within a node apart from recognizing a period of time where there is no activity 'on the wire'. To measure render response time, it is generally necessary to include functional test scripts as part of the performance test scenario. Many load testing tools do not offer this feature.

Performance specifications

It is critical to detail performance specifications and document them in any performance test plan. Ideally, this is done during the requirements development phase of any system development project, prior to any design effort. See Performance Engineering for more details.
The performance specification in terms of response time, concurrency, etc. is usually captured in a Service Level Agreement. Alternatively, the performance of a test case can be compared against its previous execution in order to identify regressions. Since performance measurements are non-deterministic, capturing performance regressions requires appropriate repetition of the performance tests workload and appropriate statistical testing.
Additionally, performance testing is frequently used as part of the process of performance profile tuning. The idea is to identify the bottleneck – the part of the system which, if it is made to respond faster, will result in the overall system running faster. It is sometimes a difficult task to identify which part of the system represents this critical path, and some test tools include instrumentation that runs on the server and reports transaction times, database access times, network overhead, and other server monitors, which can be analyzed together with the raw performance statistics. Without such instrumentation one might have to have to rely on system monitoring.
Performance testing can be performed across the web, and even done in different parts of the country, since it is known that the response times of the internet itself vary regionally. It can also be done in-house, although routers would then need to be configured to introduce the lag that would typically occur on public networks. Loads should be introduced to the system from realistic points. For example, if 50% of a system's user base will be accessing the system via a 56K modem connection and the other half over a T1, then the load injectors should either inject load over the same mix of connections or simulate the network latency of such connections, following the same user profile.
It is always helpful to have a statement of the likely peak number of users that might be expected to use the system at peak times. If there can also be a statement of what constitutes the maximum allowable 95 percentile response time, then an injector configuration could be used to test whether the proposed system met that specification.

Questions to ask

Performance specifications should ask the following questions, at a minimum:

In detail, what is the performance test scope? What subsystems, interfaces, components, etc. are in and out of scope for this test?
For the user interfaces involved, how many concurrent users are expected for each ?
What does the target system look like ?
What is the Application Workload Mix of each system component?.

Prerequisites

A stable build of the system which must resemble the production environment as closely as is possible.
To ensure consistent results, the performance testing environment should be isolated from other environments, such as user acceptance testing or development. As a best practice it is always advisable to have a separate performance testing environment resembling the production environment as much as possible.

Test conditions

In performance testing, it is often crucial for the test conditions to be similar to the expected actual use. However, in practice this is hard to arrange and not wholly possible, since production systems are subjected to unpredictable workloads. Test workloads may mimic occurrences in the production environment as far as possible, but only in the simplest systems can one exactly replicate this workload variability.
Loosely-coupled architectural implementations have created additional complexities with performance testing. To truly replicate production-like states, enterprise services or assets that share a common infrastructure or platform require coordinated performance testing, with all consumers creating production-like transaction volumes and load on shared infrastructures or platforms. Because this activity is so complex and costly in money and time, some organizations now use tools to monitor and simulate production-like conditions in their performance testing environments to understand capacity and resource requirements and verify / validate quality attributes.

Timing

It is critical to the cost performance of a new system that performance test efforts begin at the inception of the development project and extend through to deployment. The later a performance defect is detected, the higher the cost of remediation. This is true in the case of functional testing, but even more so with performance testing, due to the end-to-end nature of its scope. It is crucial for a performance test team to be involved as early as possible, because it is time-consuming to acquire and prepare the testing environment and other key performance requisites.

Tools

Performance testing is mainly divided into two main categories:

Performance scripting

This part of performance testing mainly deals with creating/scripting the work flows of key identified business processes. This can be done using a wide variety of tools.
Each of the tools mentioned in the above list either employs a scripting language or some form of visual representation to create and simulate end user work flows. Most of the tools allow for something called "Record & Replay", where in the performance tester will launch the testing tool, hook it on a browser or thick client and capture all the network transactions which happen between the client and server. In doing so a script is developed which can be enhanced/modified to emulate various business scenarios.

Performance monitoring

This forms the other face of performance testing. With performance monitoring, the behavior and response characteristics of the application under test are observed. The below parameters are usually monitored during the a performance test execution
Server hardware Parameters

CPU Utilization
Memory Utilization
Disk utilization
Network utilization

As a first step, the patterns generated by these 4 parameters provide a good indication on where the bottleneck lies. To determine the exact root cause of the issue, software engineers use tools such as profilers to measure what parts of a device or software contribute most to the poor performance, or to establish throughput levels for maintained acceptable response time.

Technology

Performance testing technology employs one or more PCs or Unix servers to act as injectors, each emulating the presence of numbers of users and each running an automated sequence of interactions with the host whose performance is being tested. Usually, a separate PC acts as a test conductor, coordinating and gathering metrics from each of the injectors and collating performance data for reporting purposes. The usual sequence is to ramp up the load: to start with a few virtual users and increase the number over time to a predetermined maximum. The test result shows how the performance varies with the load, given as number of users vs. response time. Various tools are available to perform such tests. Tools in this category usually execute a suite of tests which emulate real users against the system. Sometimes the results can reveal oddities, e.g., that while the average response time might be acceptable, there are outliers of a few key transactions that take considerably longer to complete – something that might be caused by inefficient database queries, pictures, etc.
Performance testing can be combined with stress testing, in order to see what happens when an acceptable load is exceeded. Does the system crash? How long does it take to recover if a large load is reduced? Does its failure cause collateral damage?
Analytical Performance Modeling is a method to model the behavior of a system in a spreadsheet. The model is fed with measurements of transaction resource demands, weighted by the transaction-mix. The weighted transaction resource demands are added up to obtain the hourly resource demands and divided by the hourly resource capacity to obtain the resource loads. Using the response time formula, response times can be calculated and calibrated with the results of the performance tests. Analytical performance modeling allows evaluation of design options and system sizing based on actual or anticipated business use. It is therefore much faster and cheaper than performance testing, though it requires thorough understanding of the hardware platforms.

Tasks to undertake

Tasks to perform such a test would include:

Decide whether to use internal or external resources to perform the tests, depending on inhouse expertise.
Gather or elicit performance requirements from users and/or business analysts.
Develop a high-level plan, including requirements, resources, timelines and milestones.
Develop a detailed performance test plan.
Choose test tool.
Specify test data needed and charter effort.
Develop proof-of-concept scripts for each application/component under test, using chosen test tools and strategies.
Develop detailed performance test project plan, including all dependencies and associated timelines.
Install and configure injectors/controller.
Configure the test environment, router configuration, quiet network, deployment of server instrumentation, database test sets developed, etc.
Dry run the tests - before actually executing the load test with predefined users, a dry run is carried out in order to check the correctness of the script.
Execute tests – probably repeatedly in order to see whether any unaccounted-for factor might affect the results.
Analyze the results - either pass/fail, or investigation of critical path and recommendation of corrective action.

Methodology

Performance testing web applications

According to the Microsoft Developer Network the Performance Testing Methodology consists of the following activities:

Identify the Test Environment. Identify the physical test environment and the production environment as well as the tools and resources available to the test team. The physical environment includes hardware, software, and network configurations. Having a thorough understanding of the entire test environment at the outset enables more efficient test design and planning and helps you identify testing challenges early in the project. In some situations, this process must be revisited periodically throughout the project's life cycle.
Identify Performance Acceptance Criteria. Identify the response time, throughput, and resource-use goals and constraints. In general, response time is a user concern, throughput is a business concern, and resource use is a system concern. Additionally, identify project success criteria that may not be captured by those goals and constraints; for example, using performance tests to evaluate which combination of configuration settings will result in the most desirable performance characteristics.
Plan and Design Tests. Identify key scenarios, determine variability among representative users and how to simulate that variability, define test data, and establish metrics to be collected. Consolidate this information into one or more models of system usage to implemented, executed, and analyzed.
Configure the Test Environment. Prepare the test environment, tools, and resources necessary to execute each strategy, as features and components become available for test. Ensure that the test environment is instrumented for resource monitoring as necessary.
Implement the Test Design. Develop the performance tests in accordance with the test design.
Execute the Test. Run and monitor your tests. Validate the tests, test data, and results collection. Execute validated tests for analysis while monitoring the test and the test environment.
Analyze Results, Tune, and Retest. Analyze, consolidate, and share results data. Make a tuning change and retest. Compare the results of both tests. Each improvement made will return smaller improvement than the previous improvement. When do you stop? When you reach a CPU bottleneck, the choices then are either improve the code or add more CPU.