Design and analysis of spatially-partitioned shared caches
Author(s)Beckmann, Nathan (Nathan Zachary)
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Data movement is a growing problem in modern chip-multiprocessors (CMPs). Processors spend the majority of their time, energy, and area moving data, not processing it. For example, a single main memory access takes hundreds of cycles and costs the energy of a thousand floating-point operations. Data movement consumes more than half the energy in current processors, and CMPs devote more than half their area to on-chip caches. Moreover, these costs are increasing as CMPs scale to larger core counts. Processors rely on the on-chip caches to limit data movement, but CMP cache design is challenging. For efficiency reasons, most cache capacity is shared among cores and distributed in banks throughout the chip. Distribution makes cores sensitive to data placement, since some cache banks can be accessed at lower latency and lower energy than others. Yet because applications require sufficient capacity to fit their working sets, it is not enough to just use the closest cache banks. Meanwhile, cores compete for scarce capacity, and the resulting interference, left unchecked, produces many unnecessary cache misses. This thesis presents novel architectural techniques that navigate these complex tradeoffs and reduce data movement. First, virtual caches spatially partition the shared cache banks to fit applications' working sets near where they are used. Virtual caches expose the distributed banks to software, and let the operating system schedule threads and their working sets to minimize data movement. Second, analytical replacement policies make better use of scarce cache capacity, reducing expensive main memory accesses: Talus eliminates performance cliffs by guaranteeing convex performance, and EVA uses planning theory to derive the optimal replacement metric under uncertainty. These policies improve performance and make qualitative contributions: Talus is cheap to predict, and so lets cache partitioning techniques (including virtual caches) work with high-performance cache replacement; and EVA shows that the conventional approach to practical cache replacement is sub-optimal. Designing CMP caches is difficult because architects face many options with many interacting factors. Unlike most prior caching work that employs best-effort heuristics, we reason about the tradeoffs through analytical models. This analytical approach lets us achieve the performance and efficiency of application-specific designs across a broad range of applications, while further providing a coherent theoretical framework to reason about data movement. Compared to a 64-core CMP with a conventional cache design, these techniques improve end-to-end performance by up to 76% and an average of 46%, save 36% of system energy and reduce cache area by 10%, while adding small area, energy, and runtime overheads.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 167-176).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.