Despite its long and successful history, TCP is ill-suited for modern datacenters. Every significant element of TCP, from its stream orientation to its expectation of in-order packet delivery, is inadequate for the datacenter environment. The fundamental issues with TCP are too interrelated to be fixed incrementally; the only way to harness the full performance potential of modern networks is to introduce a new transport protocol. Homa, a novel transport protocol, demonstrates that it is possible to avoid all of TCP’s problems. Although Homa is not API-compatible with TCP, it can be integrated with RPC frameworks to bring it into widespread usage.
Introduction
TCP, designed in the late 1970s, has been phenomenally successful and adaptable. Originally created for a network with about 100 hosts and link speeds of tens of kilobits per second, TCP has scaled to billions of hosts and link speeds of 100 Gbit/second or more. However, datacenter computing presents unprecedented challenges for TCP. With millions of cores in close proximity and applications harnessing thousands of machines interacting on microsecond timescales, TCP's performance is suboptimal. TCP introduces overheads that limit application-level performance, contributing significantly to the "datacenter tax."
This position paper argues that TCP’s challenges in the datacenter are insurmountable. Each major design decision in TCP is wrong for the datacenter, leading to significant negative consequences. These problems impact systems at multiple levels, including the network, kernel software, and applications. For instance, TCP interferes with load balancing, a critical aspect of datacenter operations.
Requirements for Datacenter Transport Protocols
Before discussing TCP’s problems, it’s essential to understand the challenges that any transport protocol for datacenters must address:
- Reliable Delivery: The protocol must ensure data is delivered reliably from one host to another, despite transient failures.
- Low Latency: Modern networking hardware enables round-trip times of a few microseconds for short messages. The transport protocol must not add significantly to this latency.
- High Throughput: The protocol must support high data throughput and high message throughput, essential for communication patterns like broadcast and shuffle.
- Congestion Control: The protocol must limit the buildup of packets in network queues to provide low latency.
- Efficient Load Balancing: With rapidly increasing network speeds, the protocol must distribute load across multiple cores to keep up with high-speed links.
- NIC Offload: Software-based transport protocols are becoming obsolete. Future protocols must move to special-purpose NIC hardware to provide high performance at an acceptable cost.
Everything about TCP is Wrong
TCP’s key properties, including stream orientation, connection orientation, bandwidth sharing, sender-driven congestion control, and in-order packet delivery, are all wrong for datacenter transport. Each of these decisions has serious negative consequences:
- Stream Orientation: TCP’s byte stream model is not suitable for datacenter applications, which typically exchange discrete messages. This model introduces complexity and overheads, such as maintaining state for partially-received messages.
- Connection Orientation: TCP requires long-lived connection state for each peer, resulting in high overheads. This is problematic for datacenter environments where applications can have hundreds or thousands of connections.
- Bandwidth Sharing: TCP’s fair scheduling approach performs poorly under load, discriminating heavily against short messages, which are critical in datacenter environments.
- Sender-Driven Congestion Control: TCP’s congestion control is hobbled by its reliance on buffer occupancy and lack of priority queues, leading to a dilemma where it is difficult to optimize both latency and throughput.
- In-Order Packet Delivery: TCP’s assumption of in-order packet delivery restricts load balancing, leading to hot spots in both hardware and software, and consequently high tail latency.
TCP is Beyond Repair
Incremental fixes to TCP are unlikely to succeed due to the deeply embedded and interrelated nature of its problems. For example, TCP’s congestion control has been extensively studied, and while improvements like DCTCP have been made, significant additional improvements will only be possible by breaking some of TCP’s fundamental assumptions.
Homa: A Clean-Slate Redesign
Homa represents a clean-slate redesign of network transport for the datacenter. Its design differs from TCP in every significant aspect:
- Messages: Homa is message-based, implementing remote procedure calls (RPCs). This enables more efficient load balancing and run-to-completion scheduling.
- No Connections: Homa is connectionless, eliminating connection setup overhead and allowing a single socket to manage any number of concurrent RPCs.
- SRPT: Homa implements Shortest Remaining Processing Time (SRPT) scheduling to favor shorter messages, using priority queues in modern switches.
- Receiver-Driven Congestion Control: Homa manages congestion from the receiver, which has knowledge of all its incoming messages, making it better positioned to manage congestion.
- Out-of-Order Packets: Homa can tolerate out-of-order packet arrivals, providing more flexibility for load balancing and potentially eliminating core congestion.
Getting There from Here
Replacing TCP will be difficult due to its entrenched status. However, integrating Homa with major RPC frameworks like gRPC and Apache Thrift can bring it into widespread usage. This approach allows applications using these frameworks to switch to Homa with little or no work.
Conclusion
TCP is the wrong protocol for datacenter computing. Every aspect of its design is inadequate for the datacenter environment. To eliminate the 'datacenter tax,' we must move to a radically different protocol like Homa. Integrating Homa with RPC frameworks is the best way to bring it into widespread usage. For more information, you can refer to the whitepaper It's Time to Replace TCP in the Datacenter.
Comments (2)