TCP provides an effective abstraction of a reliable network running over an unreliable channel:
- retransmission of data
- in-order delivery
- congestion control & avoidance
- data integegrity
TCP Fast Open: Allows data transfer within the SYN packet (Linux 3.7+ kernels)
Receive window (rwnd): Each side of the TCP connection has its own rwnd which communicates the size of the available buffer space to hold incomming data. Each ACK packet carries the latest rwnd value for each side.
TCP window scaling: Allocated 16 bits place an upper limit of 65KB on the rwnd but "window scaling" raises the max rwnd to 1GB.
> Check if window scaling is enabled: sysctl net.ipv4.tcp_window_scaling
> Enable window scaling: sysctl -w net.ipv4.tcp_window_scaling=1
Congestion window (cwnd): Sender-side limit on the amount of data the sender can have in flight before receiving an ACK from the client. cwnd is not exchanged between sender and receiver. It is a private variable of the sender.
The max amount of data in flight is the min of the receive window and the congestion window.
Slow Start: Avoid to overwhelm the underlining network. The cwnd size starts with 4 or 10 (specified April 2013; Linux 2.6.39 kernel) network segments (1,460 bytes when the Max. Transmission Unit is 1500 bytes). For every received ACK the sender can increment its cwnd by one segment. No matter the available bandwidth every TCP connectoin must go thru the slow start phase.
Upon packet loss the cwnd is adjusted to avoid overwhelming the network.
Slow Start Restart (SSR): Resets the cwnd of a connection after it has been idle for a defined period of time. Should be disabled on a server.
> Check SSR: sysctl net.ipv4.tcp_slow_start_after_idle
> Disable SSR: sysctl -w net.ipv4.tcp_slow_start_after_idle=0
Head of line blocking: If one packet is lost the all subsequent packets must be held in the receivers TCP buffer until the lost packet is retransmitted and arrives.
The file "sfchronicle.pcap" is a capture of the network traffic done with Wireshark of one HTTP get-request (http://www.sfchronicle.com/) between Munich/Germany and San Francisco/USA:
If opened in Wireshark one can see that:
- The latency/round-trip-time is about 210ms between the "SYN" and the "SYN ACK" package
- The first receive window size specified by the client was 29312 bytes (229 multiplied with the specified window scaling factor of 128)
- In later "ACK"s the receive window size was dynamically increased by the client
- The server has an initial congestions window of 10. After sending 10 packets the server had to wait for an "ACK" before being able to send more packets - this introduces another 200ms latency.
- Then the server increased its cwnd with every received "ACK" and could send 20 packets in the next burst before it had to wait again for "ACK"s - another 185ms latency
- In the next burst it could send about 40 packets and then had to wait once more for 165ms. With the following burst it finally managed to transmit all the data
I used "Captcp" to create a nice TCP throughput diagramm from the pcap file.