Optimizing throughput architectures for speculative parallelism

Abeydeera, Maleen Hasanka (Weeraratna Patabendige Maleen Hasanka)

Author(s)

Abeydeera, Maleen Hasanka (Weeraratna Patabendige Maleen Hasanka)

DownloadFull printable version (6.852Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Daniel Sanchez.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Throughput-oriented architectures, like GPUs, use a large number of simple cores and rely on application-level parallelism, using multithreading to keep the cores busy. These architectures work well when parallelism is plentiful but work poorly when its not. Therefore, it is important to combine these techniques with other hardware support for parallelizing challenging applications. Recent work has shown that speculative parallelism is plentiful for a large class of applications that have traditionally been hard to parallelize. However, adding hardware support for speculative parallelism to a throughput-oriented system leads to a severe pathology: aborted work consumes scarce resources and hurts the throughput of useful work. This thesis develops a technique to optimize throughput-oriented architectures for speculative parallelism: tasks should be prioritized according to how speculative they are. This focuses resources on work that is more likely to commit, reducing aborts and using speculation resources more efficiently. We identify two on-chip resources where this prioritization is most likely to help, the core pipeline and the memory controller. First, this thesis presents speculation-aware multithreading (SAM), a simple policy that modifies a multithreaded processor pipeline to prioritize instructions from less speculative tasks. Second, we modify the on-chip memory controller to prioritize requests issued by tasks that are earlier in the conflict resolution order. We evaluate SAM on systems with up to 64 SMT cores. With SAM, 8-threaded in-order cores outperform single-threaded cores by 2.41 x on average, while a speculation-oblivious policy yields a 1.91 x speedup. SAM also reduces wasted work by 43%. Unlike at the core, we find little performance benefit from prioritizing requests at the memory controller. The reason is that speculative execution works as a very effective prefetching mechanism, and most requests, even those from tasks that are ultimately aborted, do end up being useful.

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 57-62).

Date issued

2017

URI

http://hdl.handle.net/1721.1/111930

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses