In this series, you will learn about crucial details of the most important (and complicated) TCP protocol. For your reference, below is a list of the articles in this series:
- Part 1: Connection Establishment (this article)
- Part 2: Connection Termination
- Part 3: congestion control
TCP is a transport layer protocol with 3 main properties: reliability, flow control and congestion control (which are also used to differentiate itself from UDP). TCP is reliable because the protocol ensures that all data is fully transmitted and can be assembled by the receiver in the correct order. Reliability is provided through TCP connection, checksum and retransmission.
In this article, we will discuss TCP connection establishment.
In a nutshell, a TCP connection is defined by a pair of endpoints or sockets where each endpoint is identified by an (IP address, port number) pair, so if you terminate a connection and restart with the same endpoints, you have established 2 instantiations of the same connection.
From kernel level, Socket is created on each device with socket() call and binding the socket to an address with bind() call. The passive opener uses listen() call to wait for connection. The active opener initiates connection with connect(), the passive opener then accepts the connection with accept() call. Networking socket is an open file, so kernel will give a file descriptor to the process.
Since the client or active opener initiates the connection, it needs to know the IP and port number of the server, and this is usually a well-known predefined port. On the other hand, the server or passive opener does not need to know the client’s port before, it can deduce the client’s port from TCP segment and IP address from IP header in the client’s first request packet. Hence, the client system usually self-assigned its’ socket a temporary ephemeral port numbers.
In preparation of connection, both TCP devices also prepare a data structure called transmission control block (TCB) which holds the information of connection, i.e. IP addresses and port numbers, pointers to buffer where the incoming and outgoing data are held, the variables that keep track of the number of bytes received and acknowledged, bytes received and not yet acknowledged, current window size, etc. Each device maintains TCB on its own.
Finally, you can look at finite state machine model of TCP transition to have a better understanding.
Courtesy to IBM.com 1
TCP Connection Establishment
From the networking point, TCP connection is established with the well-known 3-way handshake.
- The active opener or client initializes the connection with a SYN packet. In the TCP segment header, only the SYN flag is set. Also, the client sends its initial sequence number (ISN), putting in Sequence Number of the header. The client moves to SYN_SENT state.
SYN: 1, ACK: 0 Sequence Number: ISN_client
If the SYN packet is not acknowledged, the client retries, with the frequency doubles every time, i.e. It sends at 1s, 3s, 7s, 15s, 31s, 63s marks (the frequency starts at 2s and then doubles each time). By default the total time is 130 seconds, until the kernel gives up with the ETIMEDOUT errno. The variable tcp_syn_retries defines number of retries of SYN packet.
$cat /proc/sys/net/ipv4/tcp_syn_retries 6
- The passive opener or server replies with an ACK+SYN packet. In the TCP segment header, both ACK and SYN flags are set. Since ACK is now set, an Acknowledgment Number with value ISN_client + 1 is put in the header, 1, server also generate and sends its own ISN. The server now moves to SYN_RCVD state.
SYN: 1, ACK: 1 Acknowledgment Number: ISN_client + 1 Sequence Number: ISN_server
The variable tcp_synack_retries in proc filesystem defines number of retries of ACK+SYN packet.
$cat /proc/sys/net/ipv4/tcp_synack_retries 5
- Finally, the client sends an ACK packet. Of course, only ACK flag needs to be set this time. Sequence Number is its own ISN + 1, Acknowledgment Number is now is equal to the server’s ISN + 1. Both client and server move to ESTABLISHED state.
SYN: 0, ACK: 1 Acknowledgment Number: ISN_server + 1 Sequence Number: ISN_client + 1
There are some fields that are used in TCP segment header during establishment (you can view these fields with tcpdump or wireshark tools)
TCP Flags: There are 9 1-bit TCP flags, with 6 of them are commonly used: SYN, ACK, FIN, RST, URG, PSH
- Sequence Number: a 32-bits field that used to prevent the receiver from receiving duplicate packets and guarantee packet ordering. The ISN is chosen randomly by the system for each connection. The sequence numbers are crafted in such a way to minimize some issues:
- The sequence numbers in a packet must not be allowed to overlap with sequence numbers on a new instantiation of the connection, i.e. If a connection had segments delayed and closed, but then opened again with the same 4-tuple, the delayed segments could re-enter the new connection instantiation’s data stream as valid data.
- Sequence number is designed to be hard to guess to prevent Spoofing, i.e. the connection can be intercepted if IP, port number and sequence numbers are known to an intruder.
Acknowledgment Number: a 32-bits field that indicates the largest byte received in order at the receiver. This is the only way that sender figure out that packets are lost in transit, i.e. when duplicate ACKs are received.
- “Option” field is also used. For instance, Modern TCPs have Selective Acknowledgment (SACK) option that allows the receiver to indicate out-of-order data it has received. SACKs work by appending to a duplicate acknowledgment packet a TCP option containing a range of noncontiguous data received.
TCP connection queues
A concurrent server invokes a new process or thread to handle each client, so the listening server should always be ready to handle the next incoming connection request. There is still a chance that multiple connection requests will arrive while the listening server is creating a new process, or while the operating system is busy running other higher-priority processes, or worse yet, that the server is being attacked with bogus connection requests that are never allowed to be established.
Modern Linux uses two queues to handle incoming TCP connections:
- SYN queue: to store the incoming SYN segment. The system-wide parameter tcp_max_syn_backlog specifies the length of this queue. If the number of connections in the SYN_RCVD state (SYN has received) would exceed this threshold, the incoming connection is rejected.
$cat /proc/sys/net/ipv4/tcp_max_syn_backlog 128 # newer kernel $cat /proc/sys/net/core/somaxconn 128
- Accept queue: to stores the connections that have completed the 3-way handshake and ready to be dequeued and accepted by the application with accept() system call. However, the system use the backlog parameter in listen(fd, backlog) to determined the size of queue for one endpoint (this value however has no effect on the maximum number of established connections allowed by the system, or on the number of clients that a concurrent server can handle concurrently)
If the accept queue is full, there are 2 solutions:
- If tcp_abort_on_overflow=1, the server responds with reset packet, but it can cause the client to think that no server is present and abort the operation altogether.
- If tcp_abort_on_overflow=0, the server delayed responding to SYN packet which can cause timeout from client’s side and client can choose to retry
TCP Fast Open
In the past, the client and server do not send data until 3-way handshake completes. However, the new TCP Fast Open (TFO) proposal allows payload to transfer during connection establishment to eliminate the round trip and lower the latency.
- In the first connection, the client and server perform the normal 3-way handshake. However, the server puts a TFO cookie in TCP option field and sends along with ACK+SYN packet.
- When the client later reconnects, it sends data and the cookie in the SYN packet. If the server validates the cookie successfully, the server start sending data in the ACK+SYN packet and the rest follows normal TCP operation. The specification requires data is not passed to the application until the 3-way handshake is complete.
Once the connection is established, the hosts can start transferring data. In the next post, we will discuss about connection termination.