A hardware and software architecture for efficient datacenters

Kasture, Harshad

Author(s)

Kasture, Harshad

DownloadFull printable version (25.79Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Daniel Sanchez.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Datacenters host an increasing amount of the world's compute, powering a diverse set of applications that range from scientific computing and business analytics to massive online services such as social media and online maps. Despite their growing importance, however, datacenters suffer from low resource and energy efficiency, using only 10-30% of their compute capacity on average. This overprovisioning adds billions of dollars annually to datacenter equipment costs, and wastes significant energy. This low efficiency stems from two sources. First, latency-critical applications, which form the backbone of user-facing, interactive services, need guaranteed low response times, often a few tens of milliseconds or less. By contrast, current systems are architected to maximize long-term, average performance (e.g., throughput over a period of seconds), and cannot provide the short-term performance guarantees needed by these applications. The stringent performance requirements of latency-critical applications make power management challenging, and make it hard to colocate them with other applications, as interference in shared resources hurts their responsiveness. Second, throughput-oriented batch applications, while easier to colocate, experience performance degradation as multiple colocated applications compete for shared resources on servers. This thesis presents novel hardware and software techniques that improve resource and energy efficiency for both classes of applications. First, Ubik is a dynamic cache partitioning technique that allows latency-critical and batch applications to safely share the last-level cache, maximizing batch throughput while providing latency guarantees for latency-critical applications. Ubik accurately predicts the transients that result when caches are reconfigured, and can thus mitigate latency degradation due to performance inertia, i.e., the loss of performance as an application transitions between steady states. Second, Rubik is a fine-grain voltage and frequency scaling scheme that quickly and accurately adapts to short-term load variations in latency-critical applications to minimize dynamic power consumption without hurting latency. Rubik uses a novel, lightweight statistical model that accurately predicts queued work, and accounts for variations in per-request compute requirements as well as queuing delays. Further, Rubik improves system utilization by allowing latency-critical and batch applications to safely share cores, using frequency scaling to mitigate performance degradation due to interference in per-core resources such as private caches. Third, Shepherd is a cluster scheduler that uses per-node cache-partitioning decisions to drive application placement across machines. Shepherd uses detailed application profiling data to partition the last-level cache on each machine and to predict the performance of colocated applications, and uses randomized search to find a schedule that maximizes throughput. A common theme across these techniques is the use of lightweight, general-purpose architectural support to provide performance isolation and fast state transitions, coupled with intelligent software runtimes that configure the hardware to meet application performance requirements. Unlike prior work, which often relies on heuristics, these techniques use accurate analytical modeling to guide resource allocation, boosting efficiency while satisfying applications' disparate performance goals. Ubik allows latency-critical and batch applications to be safely and efficiently colocated, improving batch throughput by an average of 17% over a static partitioning scheme while guaranteeing tail latency. Rubik further allows these two classes of applications to share cores, reducing datacenter power consumption by up to 31% while using 41% fewer machines over a scheme that segregates these applications. Shepherd improves batch throughput by 39% over a randomly scheduled, unpartitioned baseline, and significantly outperforms scheduling-only and partitioning-only approaches.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 121-131).

Date issued

2017

URI

http://hdl.handle.net/1721.1/109005

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses