IMP: Indirect Memory Prefetcher
Author(s)
Yu, Xiangyao; Hughes, Christopher J.; Satish, Nadathur; Devadas, Srinivas
DownloadIMP Indirect memory prefetcher.pdf (812.7Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns.
A majority of these irregular accesses come from indirect patterns of the form A[B[i]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality.
Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).
Date issued
2015-12Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
Publisher
Association for Computing Machinery (ACM)
Citation
Yu, Xiangyao, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. “IMP: Indirect Memory Prefetcher” Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48 (2015).
Version: Author's final manuscript
ISBN
9781450340342