Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling
Author(s)Beckmann, Nathan Zachary; Tsai, Po-An; Sanchez, Daniel
MetadataShow full item record
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to reduce access latency, but have ignored thread placement; and applying prior NUMA thread placement schemes to NUCA is inefficient, as capacity, not bandwidth, is the main constraint. We present CDCS, a technique to jointly place threads and data in multicores with distributed shared caches. We develop novel monitoring hardware that enables fine-grained space allocation on large caches, and data movement support to allow frequent full-chip reconfigurations. On a 64-core system, CDCS outperforms an S-NUCA LLC by 46% on average (up to 76%) in weighted speedup and saves 36% of system energy. CDCS also outperforms state-of-the-art NUCA schemes under different thread scheduling policies.
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Proceedings of the 21st IEEE Symposium on High Performance Computer Architecture
Institute of Electrical and Electronics Engineers (IEEE)
Beckmann, Nathan, Po-An Tsai, and Daniel Sanchez. "Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling." 21st IEEE Symposium on High Performance Computer Architecture (February 2015).
Author's final manuscript