Anuj Kalia
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Datacenter network latencies are approaching their microsecond-scale speed-of-light limit, and network bandwidths continue to grow beyond 100 Gbps. These improvements bear rethinking the design of communication-intensive distributed systems for datacenters, whose performance has historically been limited by slow networks. With the slowing down of Moore’s law, a popular approach is to redesign distributed systems to use network hardware devices and technologies that ooad communication or data access from commodity CPUs, such as smart network cards (NICs), lossless networks, programmable NICs, and programmable switches.
In this dissertation, we show that we can continue to use end-to-end software-only communication mechanisms to build high-performance distributed systems, i.e., we bring the speed of fast networks to distributed systems without an expensive redesign with in-network hardware ooads. We show that the ubiquitous Remote Procedure Call (RPC) communication mechanism, when rearchitected specially for the capabilities of modern commodity datacenter hardware, is a fast, scalable, exible, and simple communication choice for distributed systems. We make three contributions. First, we present a detailed analysis of datacenter communication hardware—ranging from the peripheral bus that connects CPUs to NICs, to the datacenter’s switched network—that informs our choice of the communication mechanism. Second, we lay out the advantages of RPCs over network hardware ooads through the design and evaluation of two new systems, a key-value store called HERD, and a distributed transaction processing system called FaSST. Third, we combine the lessons learned from the rst two steps with new insights about datacenter packet loss and congestion control to create a new RPC library called eRPC, and show how existing distributed system codebases perform well over eRPC. In many cases, these systems substantially outperform ooads because they use less communication, and their end-to-end design provides exibility and simplicity.
KEYWORDS: datacenter networks, distributed systems, Remote Procedure Calls, Remote Direct Memory Access, eRPC
FULL TR: pdf