Software performance testing


In software quality assurance, performance testing is in general a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.
Performance testing, a subset of performance engineering, is a computer science practice which strives to build performance standards into the implementation, design and architecture of a system.

Testing types

Tests examining the behavior under load are categorized into six basic types: Baseline test, load test, stress test, soak test, smoke test or isolation test. Additionally to these basic types, configuration testing and Internet testing can be done.

Baseline testing

Baseline testing is used to create a comparison point for other types of tests, e.g., for a stress test. By measuring how the system reacts in a "best case", for example only 5 parallel users, the other test types can be used to compare how the performance degrades in the worst case.

Load testing

is the simplest form of performance testing. A load test is usually conducted to understand the behavior of the system under a specific expected load. This load can be the expected concurrent number of users on the application performing a specific number of transactions within the set duration. This test will give out the response times of all the important business critical transactions. The database, application server, etc. are also monitored during the test, this will assist in identifying bottlenecks in the application software and the hardware that the software is installed on.

Stress testing

is normally used to understand the upper limits of capacity within the system. This kind of test is done to determine the system's robustness in terms of extreme load and helps application administrators to determine if the system will perform sufficiently if the current load goes well above the expected maximum.
Spike testing is a special form of stress testing, and is done by suddenly increasing or decreasing the load generated by a very large number of users, and observing the behavior of the system. The goal is to determine whether performance will suffer, the system will fail, or it will be able to handle dramatic changes in load.
Breakpoint testing is also a form of stress testing. An incremental load is applied over time while the system is monitored for predetermined failure conditions. Breakpoint testing is sometimes referred to as Capacity Testing because it can be said to determine the maximum capacity below which the system will perform to its required specifications or Service Level Agreements. The results of breakpoint analysis applied to a fixed environment can be used to determine the optimal scaling strategy in terms of required hardware or conditions that should trigger scaling-out events in a cloud environment.

Soak testing

, also known as endurance testing or stability testing, is usually done to determine if the system can sustain the continuous expected load. During soak tests, memory utilization is monitored to detect potential leaks. Also important, but often overlooked is performance degradation, i.e. to ensure that the throughput and/or response times after some long period of sustained activity are as good as or better than at the beginning of the test. It essentially involves applying a significant load to a system for an extended, significant period of time. The goal is to discover how the system behaves under sustained use.

Isolation testing

Isolation testing is not unique to performance testing but involves repeating a test execution that resulted in a system problem. Such testing can often isolate and confirm the fault domain.

Configuration testing

Rather than testing for performance from a load perspective, tests are created to determine the effects of configuration changes to the system's components on the system's performance and behavior. A common example would be experimenting with different methods of load-balancing.

Internet testing

This is a relatively new form of performance testing when global applications such as Facebook, Google and Wikipedia, are performance tested from load generators that are placed on the actual target continent whether physical machines or cloud VMs. These tests usually requires an immense amount of preparation and monitoring to be executed successfully.

Setting performance goals

Performance testing can serve different purposes:
  • It can demonstrate that the system meets performance criteria.
  • It can compare two systems to find which performs better.
  • It can measure which parts of the system or workload cause the system to perform badly.
Many performance tests are undertaken without setting sufficiently realistic, goal-oriented performance goals. The first question from a business perspective should always be, "why are we performance-testing?". These considerations are part of the business case of the testing. Performance goals will differ depending on the system's technology and purpose, but should always include some of the following:

Concurrency and throughput

If a system identifies end-users by some form of log-in procedure then a concurrency goal is highly desirable. By definition this is the largest number of concurrent system users that the system is expected to support at any given moment. The work-flow of a scripted transaction may impact true concurrency especially if the iterative part contains the log-in and log-out activity.
If the system has no concept of end-users, then performance goal is likely to be based on a maximum throughput or transaction rate.

Server response time

This refers to the time taken for one system node to respond to the request of another. A simple example would be a HTTP 'GET' request from browser client to web server. In terms of response time this is what all load testing tools actually measure. It may be relevant to set server response time goals between all nodes of the system.

Render response time

Load-testing tools have difficulty measuring render-response time, since they generally have no concept of what happens within a node apart from recognizing a period of time where there is no activity 'on the wire'. To measure render response time, it is generally necessary to include functional test scripts as part of the performance test scenario. Many load testing tools do not offer this feature.

Performance specifications

It is critical to detail performance specifications and document them in any performance test plan. Ideally, this is done during the requirements development phase of any system development project, prior to any design effort. See Performance Engineering for more details.
The performance specification in terms of response time, concurrency, etc. is usually captured in a Service Level Agreement. Alternatively, the performance of a test case can be compared against its previous execution in order to identify regressions. Since performance measurements are non-deterministic, capturing performance regressions requires appropriate repetition of the performance tests workload and appropriate statistical testing.
Additionally, performance testing is frequently used as part of the process of performance profile tuning. The idea is to identify the bottleneck – the part of the system which, if it is made to respond faster, will result in the overall system running faster. It is sometimes a difficult task to identify which part of the system represents this critical path, and some test tools include instrumentation that runs on the server and reports transaction times, database access times, network overhead, and other server monitors, which can be analyzed together with the raw performance statistics. Without such instrumentation one might have to have to rely on system monitoring.
Performance testing can be performed across the web, and even done in different parts of the country, since it is known that the response times of the internet itself vary regionally. It can also be done in-house, although routers would then need to be configured to introduce the lag that would typically occur on public networks. Loads should be introduced to the system from realistic points. For example, if 50% of a system's user base will be accessing the system via a 56K modem connection and the other half over a T1, then the load injectors should either inject load over the same mix of connections or simulate the network latency of such connections, following the same user profile.
It is always helpful to have a statement of the likely peak number of users that might be expected to use the system at peak times. If there can also be a statement of what constitutes the maximum allowable 95 percentile response time, then an injector configuration could be used to test whether the proposed system met that specification.

Questions to ask

Performance specifications should ask the following questions, at a minimum:
  • In detail, what is the performance test scope? What subsystems, interfaces, components, etc. are in and out of scope for this test?
  • For the user interfaces involved, how many concurrent users are expected for each ?
  • What does the target system look like ?
  • What is the Application Workload Mix of each system component?.

    Prerequisites

A stable build of the system which must resemble the production environment as closely as is possible.
To ensure consistent results, the performance testing environment should be isolated from other environments, such as user acceptance testing or development. As a best practice it is always advisable to have a separate performance testing environment resembling the production environment as much as possible.