Optimizing throughput architectures for speculative parallelism
Author(s)
Abeydeera, Maleen Hasanka (Weeraratna Patabendige Maleen Hasanka)
DownloadFull printable version (6.852Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Daniel Sanchez.
Terms of use
Metadata
Show full item recordAbstract
Throughput-oriented architectures, like GPUs, use a large number of simple cores and rely on application-level parallelism, using multithreading to keep the cores busy. These architectures work well when parallelism is plentiful but work poorly when its not. Therefore, it is important to combine these techniques with other hardware support for parallelizing challenging applications. Recent work has shown that speculative parallelism is plentiful for a large class of applications that have traditionally been hard to parallelize. However, adding hardware support for speculative parallelism to a throughput-oriented system leads to a severe pathology: aborted work consumes scarce resources and hurts the throughput of useful work. This thesis develops a technique to optimize throughput-oriented architectures for speculative parallelism: tasks should be prioritized according to how speculative they are. This focuses resources on work that is more likely to commit, reducing aborts and using speculation resources more efficiently. We identify two on-chip resources where this prioritization is most likely to help, the core pipeline and the memory controller. First, this thesis presents speculation-aware multithreading (SAM), a simple policy that modifies a multithreaded processor pipeline to prioritize instructions from less speculative tasks. Second, we modify the on-chip memory controller to prioritize requests issued by tasks that are earlier in the conflict resolution order. We evaluate SAM on systems with up to 64 SMT cores. With SAM, 8-threaded in-order cores outperform single-threaded cores by 2.41 x on average, while a speculation-oblivious policy yields a 1.91 x speedup. SAM also reduces wasted work by 43%. Unlike at the core, we find little performance benefit from prioritizing requests at the memory controller. The reason is that speculative execution works as a very effective prefetching mechanism, and most requests, even those from tasks that are ultimately aborted, do end up being useful.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 57-62).
Date issued
2017Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.