When developing software at a higher level, communicating two systems is relatively easy. The go-to strategy is to use a HTTP client-server architecture probably exchanging content in JSON, and maybe using WebSockets. In this case, the only case we should handle is when a message fail being delivered, and when the connection is lost. Under the hood, the libraries and the operating system does a lot of to allow this communication. This article will detail some of these hidden operations discovered when developing a decentralized messaging app.

What is transfered in TCP and UDP connections

Before entering in the details, we must understand how people resolved the core problem: To transfer information between two systems.

At first, there must be clear that information is always transfered as bytes both in TCP and UDP. The next steps are not that relevant for now, and we should focus on the first problem, which is converting the information to bytes that can be sent through the network.

Some solutions to this problem are the following:

  • Sending the information as text, encoding with UTF-8 or some other protocol. That’s similar to how the HTTP protocol works (which is text-based with some structure). This method is most used to transfer text-based information such as HTML pages (before there were CSS, JavaScript and Flash). Imagine this as sending the raw content <html><body><h1>Hello</h1> .
  • Using a high level protocol to encode the information, not necessarily using text. This method is most used to transfer information that don’t fit as text content. For example, the current position of a player in the map, or the contents of a PNG image. Thinking of a player position in the map, this technique makes sense, because it’s more compact to send two 32-bit floating point numbers (total 8 bytes) then to send the positions as the text 9.1923182 10.2931231 which takes 20 bytes encoding in ASCII (40% more bytes than using 32-bit floating points numbers).

Dividing information into discrete parts

Independent of the technique used to convert the information into binary data, there is another problem: How can we split that information into finite segments that can be sent through the network, and how can we detect the start and end of an segment of information.

This problem is different in TCP and UDP, so a dedicated sections is necessary for each of them.

UDP

In general, we start any explanation of networks with the UDP protocol, because its has the least amount of embedded technologies when compared to TCP. In short terms, we can think of UDP packets as individual envelopes with sender and recipient labels with a letter inside the envelope. The only difference is that there is no guarantee that the envelope was delivered to the recipient.

This analogy works very well because like the actual envelope, there’s a limit on how much text you can fit in one envelope, and that depends on the size of it, and the capacity of the logistics involved in transporting the envelope.

Bringing this to the actual UDP protocol, it doesn’t have a connection (we don’t know if the other side of the “connection” is able to receive our messages) and has no delivery confirmation or retry (we can’t be sure if the other side received our sent message, and if the message fails being delivered, we are not notified, and the system doesn’t try again to send that information).

UDP does have error checking by default, which makes improbable for the message received to not be exactly the message sent.

Of course this can work well when we don’t require these conveniences, but for a lot of tasks we do require the connection and delivery confirmation. To “fix” this, the TCP protocol was created.

If the use of UDP is required, there are options to have the concept of connections and confirmation in the protocol, but they require the manual implementation of these strategies. For example, the systems can exchange a HELLO, ARE YOU ALIVE? each second to make sure they’re up and receiving the messages, and there are known protocols that use acknowledgements for each sent message, to make sure they were received. For example, the BitTorrent uses the uTP protocol, which uses a acknowledgement system to allow delivery confirmation, sequential data transfer, retry and congestion control (adjustment of data rate).

In UDP is relatively easy to split the information. We can send at most the maximum allowed packet size of UDP each time. That means there’s a defined amount of data sent in each UDP packet. Since the maximum safe size of the payload inside an UDP packet is 508 bytes, we can send each message of a system in one separate UDP packet, and if the size is greater than 508 bytes, we split it into multiple packets.

Each packet also contains the length of the payload, so we don’t need to manually insert a byte sequence indicating that the content ended, nor manually insert the size of the content to be read.

TCP

TCP is a more robust protocol, implementing:

  • Stateful connection
  • Guarantee of delivery
  • Retry
  • Sequential delivery
  • Congestion control

When developing a system, all of the previous items helps the confidence of the communication, but these conveniences comes with some drawbacks:

  • The socket opening process is slower, because it requires multiple send-receive operations before considering the packet open, and enabling communication
  • The total size of the packet is bigger, because it must contain all of the information necessary to sort, acknowledge and resend the information
  • The latency is higher, because each portion of data has to be confirmed to start processing the rest of the sequence.

Also, when developing programs for operating systems, the interface exposed by the OS is just a continuous stream of bytes, with no divisions. To send a message, we must somehow manually inform the start and end of the text content. Unlike UDP, TCP has no concept of “you’re sending a discrete packet with a maximum size of 508 bytes” (at least for programs using the socket interface of the OS), it’s rather a continuous transmission of data that ends only when the socket closes.

Since most of the information we care about have start and end, we must manually implement that definition by one of the two following methods:

  • Before sending each information, we first send the total length of the content, and then the receiving side processes that amount of information as a discrete “part”.
  • We make a convention that a specific sequence of bits represents the end of a “part”. Of course, this requires that we escape that exact sequence of bits if it’s present in the actual content.

General consideration

Based on my experience, tying to implements the conveniences of TCP in UDP requires a big overhead in the amount of metadata that needs to be transfered, and creates the need of a module in the program to implement these features. If using the safe maximum UDP payload size of 508 bytes, I can easily think of at least 32 bytes just to allow some of the features of the TCP to work.

Also, when using TCP, the application should implement a method to define the beginning and end of each “part” of information.

Of course UDP still makes sense in a lot of cases where low latency is required, and data is accepted to to be lost.

Buffering

Since the OS represents the interface of a TCP socket as a stream of bytes with no start and end, and we can’t transfer an infinite amount of data in each TCP packet, it’s up to the operating system to determine where to split the information into each packet, and that can cause a unwanted latency.

For example, if our program is sending 1GB of data each byte at a time to the operating system, it makes sense for the kernel to pack as much bytes as possible in each packet before “encapsulating” it and sending. It wouldn’t make sense to send one byte per packet (keep in mind that each packet needs 64 bytes + the payload).

Otherwise, if our program is sending only 500 bytes of data, the operating system could wait a long time before realizing there’s no more information to accumulate before encapsulating and sending the information in one packet.

The operating systems handles this automatically, balancing packet segmentation for large transfers, and latency for small transfers. These parameters can be fine tuned in Linux. By default, this implementation causes delays for small contents transfered over TCP, and this problem is present in the Nagle Algorithm (congestion control algorithm generally used by operating systems), and described in the RFC 896.

The Linux Kernel offers a socket option called TCP_NODELAY, which bypasses this buffering that can cause delay. This option can be found in the tcp(7) documentation of Linux.

Since everything in the stream-based TCP protocol is buffered, there are TCP flags for when some information needs to be delivered with the lowest possible latency, which are the PSH and URG TCP flags. To find the details, there’s a precise explanation in Stack Overflow: Difference between push and urgent flags in TCP.

For example, some of the parameters we can tweak are the buffer size (of the operating system) and the amount of data our application reads from the OS each time. If you ever programmed raw TCP sockets, maybe you wondered why do we need to specify the amount of data we want to read from a TCP stream socket.recv(1024), and how can we know that if we don’t know yet the size of the data we’re receiving. That’s the amount of data that will be transfered from the operating system network stack to the application buffer.

TCP and UDP tuning deserves their own article in this blog, so to not extend this one, if you want to deep dive into Linux and low level TCP tuning, there’s awesome references to that:

Throttling and Congestion control

An important advantage of TCP over UDP is that it automatically handles congestion control, which means that there’s a default way of telling that a network has a limited speed, and adjust the transmission to that speed.

When sending UDP packets, since there’s no confirmation of delivery, we don’t know when a packet doesn’t reached the destination. For example, if the network is congested, your sending data at a rate higher than the system supports, or you can’t reach the destination, there will be no evidence of that. As an alternative, the uTP protocol uses the acknowledgements as a way to tell if packets are being discarded, and adjusts the rate accordingly.

TCP has a congestion control algorithm handled at the operating system level that detects a congested network, and the allowed speed using the multiple metrics extracted during the transmission.

Handshake

There’s another big difference of the protocols. If you start a connection between two hosts in UDP, there’s no confirmation that the connection is established, you rather just start sending and receiving data.

TCP uses a handshake process composed of 4 steps to start the connection. So when we want to send and receive data, we first open a connection, and if successful, we start sending information. That same process occurs for gracefully closing the connection.

Where to go from here

If you want to get a deeper understanding in general of networking, intercepting connections and analyzing them with WireShark is a good starting point. In this post we used the tool in the simples form, just to show the sequence, but WireShark is a complete and complex solution, and you can easily spend a lifetime discovering the peculiarities of each protocol.

Also, if you want to take a random protocol to know how people handle low level transfer of information, you can find the complete documentation of the high level protocol used by Minecraft to transfer game state such as player position. And if you have access to the game, you can intercept the packets with WireShark, and detect the information (if now using encryption).