TCP congestion control
uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease scheme, along with other schemes including [|slow start] and a congestion window, to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet. Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.
To avoid congestive collapse, TCP uses a multi-faceted congestion-control strategy. For each connection, TCP maintains a CWND, limiting the total number of unacknowledged packets that may be in transit end-to-end. This is somewhat analogous to TCP's sliding window used for flow control.
Additive increase/multiplicative decrease
The additive increase/multiplicative decrease algorithm is a closed-loop control algorithm. AIMD combines linear growth of the congestion window with an exponential reduction when congestion occurs. Multiple flows using AIMD congestion control will eventually converge to use equal amounts of a contended link.This is the algorithm that is described in for the congestion avoidance state.
Congestion window
In TCP, the congestion window is one of the factors that determines the number of bytes that can be sent out at any time. The congestion window is maintained by the sender and is a means of preventing a link between the sender and the receiver from becoming overloaded with too much traffic. This should not be confused with the sliding window maintained by the sender, which exists to prevent the receiver from becoming overloaded. The congestion window is calculated by estimating how much congestion there is on the link.When a connection is set up, the congestion window, a value maintained independently at each host, is set to a small multiple of the maximum segment size allowed on that connection. Further variance in the congestion window is dictated by an additive increase/multiplicative decrease approach. This means that if all segments are received and the acknowledgments reach the sender on time, some constant is added to the window size. It will follow different algorithms.
A system administrator may adjust the maximum window size limit or adjust the constant added during additive increase, as part of TCP tuning.
The flow of data over a TCP connection is also controlled by the use of the receive window advertised by the receiver. A sender can send data less than its own congestion window and the receive window.
Slow start
Slow start, defined by, is part of the congestion control strategy used by TCP in conjunction with other algorithms to avoid sending more data than the network is capable of forwarding, that is, to avoid causing network congestion.Slow start begins initially with a congestion window size of 1, 2, 4 or 10 MSS. The value for the congestion window size can be increased by 1 MSS with each acknowledgment received, effectively doubling the window size each RTT.
The transmission rate will be increased by the slow-start algorithm until either a packet loss is detected, the receiver's advertised window becomes the limiting factor, or slow start threshold is reached, which is used to determine whether the slow start or congestion avoidance algorithm is used, a value set to limit slow start.
If the CWND reaches ssthresh, TCP switches to the congestion avoidance algorithm. It should be increased by up to 1 MSS for each RTT. A common formula is that each new ACK increases the CWND by It increases almost linearly and provides an acceptable approximation.
If a loss event occurs, TCP assumes that it is due to network congestion and takes steps to reduce the offered load on the network. These measures depend on the exact TCP congestion avoidance algorithm used.
When a TCP sender detects segment loss using the retransmission timer and the given segment has not yet been resent, the value of ssthresh must be set to no more than half of the amount of data that has been sent but not yet cumulatively acknowledged or 2 * MSS, whichever value is greater.
; TCP Tahoe
; TCP Reno
Slow start assumes that unacknowledged segments are due to network congestion. While this is an acceptable assumption for many networks, segments may be lost for other reasons, such as poor data link layer transmission quality. Thus, slow start can perform poorly in situations with poor reception, such as wireless networks.
The slow start protocol also performs badly for short-lived connections. Older web browsers would create many consecutive short-lived connections to the web server, and would open and close the connection for each file requested. This kept most connections in the slow start mode, which resulted in poor response time. To avoid this problem, modern browsers either open multiple connections simultaneously or reuse one connection for all files requested from a particular web server. Connections, however, cannot be reused for the multiple third-party servers used by websites to implement web advertising, sharing features of social networking services, and counter scripts of web analytics.
Fast retransmit
Fast retransmit is an enhancement to TCP that reduces the time a sender waits before retransmitting a lost segment. A TCP sender normally uses a simple timer to recognize lost segments. If an acknowledgment is not received for a particular segment within a specified time, the sender will assume the segment was lost in the network and will retransmit the segment.Duplicate acknowledgment is the basis for the fast retransmit mechanism. After receiving a packet, an acknowledgement is sent for the last in-order byte of data received. For an in-order packet, this is effectively the last packet's sequence number plus the current packet's payload length. If the next packet in the sequence is lost but a third packet in the sequence is received, then the receiver can only acknowledge the last in-order byte of data, which is the same value as was acknowledged for the first packet. The second packet is lost and the third packet is not in order, so the last in-order byte of data remains the same as before. Thus, a Duplicate acknowledgment occurs. The sender continues to send packets, and a fourth and fifth packet are received by the receiver. Again, the second packet is missing from the sequence, so the last in-order byte has not changed. Duplicate acknowledgments are sent for both of these packets.
When a sender receives three duplicate acknowledgments, it can be reasonably confident that the segment carrying the data that followed the last in-order byte specified in the acknowledgment was lost. A sender with fast retransmit will then retransmit this packet immediately without waiting for its timeout. On receipt of the retransmitted segment, the receiver can acknowledge the last in-order byte of data received. In the above example, this would acknowledge to the end of the payload of the fifth packet. There is no need to acknowledge intermediate packets since TCP uses cumulative acknowledgments by default.
Algorithms
The names Reno and Tahoe are the names of releases of the BSD UNIX operating system, and were used to refer to the congestion control algorithms at least as early a 1996 paper by Kevin Fall and Sally Floyd.The following is one possible classification according to the following properties:
- the type and amount of feedback received from the network
- incremental deployability on the current Internet
- the aspect of performance it aims to improve: high bandwidth-delay product networks ; lossy links ; fairness ; advantage to short flows ; variable-rate links ; speed of convergence
- the fairness criterion it uses
| Variant | Feedback | Required changes | Benefits | Fairness |
| Reno | Loss | Delay | ||
| Vegas | Delay | Sender | Less loss | Proportional |
| High Speed | Loss | Sender | High bandwidth | |
| BIC | Loss | Sender | High bandwidth | |
| CUBIC | Loss | Sender | High bandwidth | |
| C2TCP | Loss/Delay | Sender | Ultra-low latency and high bandwidth | |
| NATCP | Multi-bit signal | Sender | Near Optimal Performance | |
| Elastic-TCP | Loss/Delay | Sender | High bandwidth/short & long-distance | |
| Agile-TCP | Loss | Sender | High bandwidth/short-distance | |
| H-TCP | Loss | Sender | High bandwidth | |
| FAST | Delay | Sender | High bandwidth | Proportional |
| Compound TCP | Loss/Delay | Sender | High bandwidth | Proportional |
| Westwood | Loss/Delay | Sender | Lossy links | |
| Jersey | Loss/Delay | Sender | Lossy links | |
| BBR | Delay | Sender | BLVC, Bufferbloat | |
| CLAMP | Multi-bit signal | Receiver, Router | Variable-rate links | Max-min |
| TFRC | Loss | Sender, Receiver | No Retransmission | Minimum delay |
| XCP | Multi-bit signal | Sender, Receiver, Router | BLFC | Max-min |
| VCP | 2-bit signal | Sender, Receiver, Router | BLF | Proportional |
| MaxNet | Multi-bit signal | Sender, Receiver, Router | BLFSC | Max-min |
| JetMax | Multi-bit signal | Sender, Receiver, Router | High bandwidth | Max-min |
| RED | Loss | Router | Reduced delay | |
| Prague | Single-bit signal | Sender, Receiver, Router | Low latency, low loss, scalable throughput | |
| ECN | Single-bit signal | Sender, Receiver, Router | Reduced loss |