Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling
Author(s)
Beckmann, Nathan Zachary; Tsai, Po-An; Sanchez, Daniel![Thumbnail](/bitstream/handle/1721.1/95648/Sanchez_Scaling%20Distributed.pdf.jpg?sequence=4&isAllowed=y)
DownloadSanchez_Scaling Distributed.pdf (787.6Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to reduce access latency, but have ignored thread placement; and applying prior NUMA thread placement schemes to NUCA is inefficient, as capacity, not bandwidth, is the main constraint. We present CDCS, a technique to jointly place threads and data in multicores with distributed shared caches. We develop novel monitoring hardware that enables fine-grained space allocation on large caches, and data movement support to allow frequent full-chip reconfigurations. On a 64-core system, CDCS outperforms an S-NUCA LLC by 46% on average (up to 76%) in weighted speedup and saves 36% of system energy. CDCS also outperforms state-of-the-art NUCA schemes under different thread scheduling policies.
Date issued
2015-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the 21st IEEE Symposium on High Performance Computer Architecture
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Beckmann, Nathan, Po-An Tsai, and Daniel Sanchez. "Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling." 21st IEEE Symposium on High Performance Computer Architecture (February 2015).
Version: Author's final manuscript