Redesigning the memory hierarchy to exploit static and dynamic application information

Tsai, Po-An.

Author(s)

Tsai, Po-An.

Download1124762212-MIT.pdf (45.74Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Daniel Sanchez.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Memory hierarchies are crucial to performance and energy efficiency, but current systems adopt rigid, hardware-managed cache hierarchies that cause needless data movement. The root cause for the inefficiencies of cache hierarchies is that they adopt a legacy interface and ignore most application information. Specifically, they are structured as a rigid hierarchy of progressively larger and slower cache levels, with sizes and policies fixed at design time. Caches expose a flat address space to programs that hides the hierarchy's structure and transparently move data across cache levels in fixed-size blocks using simple, fixed heuristics. Besides squandering valuable application-level information, this design is very costly: providing the illusion of a flat address space requires complex address translation machinery, such as associative lookups in caches. This thesis proposes to redesign the memory hierarchy to better exploit application information.

We take a cross-layer approach that redesigns the hardware-software interface to put software in control of the hierarchy and naturally convey application semantics. We focus on two main directions: First, we design reconfigurable cache hierarchies that exploit dynamic application information to optimize their structure on the fly, approaching the performance of the best application-specific hierarchy Hardware monitors application memory behavior at low overhead, and a software runtime uses this information to periodically reconfigure the system. This approach enables software to (i) build single- or multi-level virtual cache hierarchies tailored to the needs of each application, making effective use of spatially distributed and heterogeneous (e.g., SRAM and stacked DRAM) cache banks; (ii) replicate shared data near-optimally to minimize on-chip and off-chip traffic; and (iii) schedule computation across systems with heterogeneous hierarchies (e.g., systems with near-data processors).

Specializing the memory system to each application improves performance and energy efficiency, since applications can avoid using resources that they do not benefit from, and use the remaining resources to hold their data at minimum latency and energy. For example, virtual cache hierarchies improve full-system energy-delay-product (EDP) by up to 85% over a combination of state-of-the-art techniques. Second, we redesign the memory hierarchy to exploit static application information by managing variable-sized objects, the natural unit of data access in programs, instead of fixed-size cache lines. We present the Hotpads object-based hierarchy, which leverages object semantics to hide the memory layout and dispense with the flat address space interface. Similarly to how memory-safe languages abstract the memory layout, Hotpads exposes an interface based on object pointers that disallows arbitrary address arithmetic. This avoids the need for associative caches.

Instead, Hotpads moves objects across a hierarchy of directly addressed memories. It rewrites pointers to avoid most associative lookups, provides hardware support for memory management, and unifies hierarchical garbage collection and data placement. Hotpads also enables many new optimizations. For instance, we have designed Zippads, a memory hierarchy that leverages Hotpads to compress objects. Leveraging object semantics and the ability to rewrite pointers in Hotpads, Zippads compresses and stores objects more compactly, with a novel compression algorithm that exploits redundancy across objects. Though object-based languages are often seen as sacrificing performance for productivity, this work shows that hardware can exploit this abstraction to improve performance and efficiency over cache hierarchies: Hotpads reduces dynamic memory hierarchy energy by 2.6x and improves performance by 34%; and Zippads reduces main memory footprint by 2x while improving performance by 30%.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 163-177).

Date issued

2019

URI

https://hdl.handle.net/1721.1/122742

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses