Show simple item record

dc.contributor.advisorDaniel Sanchez.en_US
dc.contributor.authorKasture, Harshaden_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2017-05-11T20:00:13Z
dc.date.available2017-05-11T20:00:13Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/109005
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 121-131).en_US
dc.description.abstractDatacenters host an increasing amount of the world's compute, powering a diverse set of applications that range from scientific computing and business analytics to massive online services such as social media and online maps. Despite their growing importance, however, datacenters suffer from low resource and energy efficiency, using only 10-30% of their compute capacity on average. This overprovisioning adds billions of dollars annually to datacenter equipment costs, and wastes significant energy. This low efficiency stems from two sources. First, latency-critical applications, which form the backbone of user-facing, interactive services, need guaranteed low response times, often a few tens of milliseconds or less. By contrast, current systems are architected to maximize long-term, average performance (e.g., throughput over a period of seconds), and cannot provide the short-term performance guarantees needed by these applications. The stringent performance requirements of latency-critical applications make power management challenging, and make it hard to colocate them with other applications, as interference in shared resources hurts their responsiveness. Second, throughput-oriented batch applications, while easier to colocate, experience performance degradation as multiple colocated applications compete for shared resources on servers. This thesis presents novel hardware and software techniques that improve resource and energy efficiency for both classes of applications. First, Ubik is a dynamic cache partitioning technique that allows latency-critical and batch applications to safely share the last-level cache, maximizing batch throughput while providing latency guarantees for latency-critical applications. Ubik accurately predicts the transients that result when caches are reconfigured, and can thus mitigate latency degradation due to performance inertia, i.e., the loss of performance as an application transitions between steady states. Second, Rubik is a fine-grain voltage and frequency scaling scheme that quickly and accurately adapts to short-term load variations in latency-critical applications to minimize dynamic power consumption without hurting latency. Rubik uses a novel, lightweight statistical model that accurately predicts queued work, and accounts for variations in per-request compute requirements as well as queuing delays. Further, Rubik improves system utilization by allowing latency-critical and batch applications to safely share cores, using frequency scaling to mitigate performance degradation due to interference in per-core resources such as private caches. Third, Shepherd is a cluster scheduler that uses per-node cache-partitioning decisions to drive application placement across machines. Shepherd uses detailed application profiling data to partition the last-level cache on each machine and to predict the performance of colocated applications, and uses randomized search to find a schedule that maximizes throughput. A common theme across these techniques is the use of lightweight, general-purpose architectural support to provide performance isolation and fast state transitions, coupled with intelligent software runtimes that configure the hardware to meet application performance requirements. Unlike prior work, which often relies on heuristics, these techniques use accurate analytical modeling to guide resource allocation, boosting efficiency while satisfying applications' disparate performance goals. Ubik allows latency-critical and batch applications to be safely and efficiently colocated, improving batch throughput by an average of 17% over a static partitioning scheme while guaranteeing tail latency. Rubik further allows these two classes of applications to share cores, reducing datacenter power consumption by up to 31% while using 41% fewer machines over a scheme that segregates these applications. Shepherd improves batch throughput by 39% over a randomly scheduled, unpartitioned baseline, and significantly outperforms scheduling-only and partitioning-only approaches.en_US
dc.description.statementofresponsibilityby Harshad Kasture.en_US
dc.format.extent131 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleA hardware and software architecture for efficient datacentersen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc986529222en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record