Mitigating Compute Congestion for Low Latency Datacenter RPCs
Author(s)
Cho, Inho
DownloadThesis PDF (6.444Mb)
Advisor
Belay, Adam M.
Alizadeh, Mohammad
Terms of use
Metadata
Show full item recordAbstract
Latency-sensitive applications in recent datacenter workloads, such as interactive machine learning inference, high-frequency algorithm trading, cloud gaming, and interactive AR/VR applications impose stringent latency requirements. These applications heavily rely on low-latency RPCs as an essential building block, often executed in mere microseconds through parallel computations and in-memory operations. Given the high fan-out RPC traffic patterns typical of these applications, it’s imperative to minimize tail latency to maintain end-to-end latency within its service level objectives (SLO).
With the innovations in datacenter networks and the end of Dennard scaling, congestion is now moving from networks to compute resources. This thesis introduces two systems, Breakwater and LDB, designed to mitigate and diagnose compute congestion, each targeting different sources of tail latency. Breakwater aims to alleviate CPU congestion and lock contention during intermittent server overload, while LDB furnishes developers with a tool to diagnose the functions causing high tail latency with low overhead.
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology