MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Compiler-Hardware Co-Design for Pervasive Parallelization

Author(s)
Ying, Victor A.
Thumbnail
DownloadThesis PDF (2.516Mb)
Advisor
Sanchez, Daniel
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Modern computer systems have hundreds of processor cores, so highly parallel programs are critical to achieve high performance. But parallel programming remains difficult on current systems, so many programs are still sequential. This dissertation presents new compilers and hardware architectures that can parallelize complex programs while retaining the simplicity of sequential code. Our new systems allow real-world programs to use hundreds of cores without burdening programmers with concurrency, deadlock, or data races. This dissertation follows a novel approach that eliminates the burden of explicit parallel programming to make parallel execution pervasive. This approach relies on four guiding principles. First, exploiting implicit parallelism preserves the simplicity of sequential execution. Second, dividing computation into tiny tasks, as short as tens of instructions each, unlocks plentiful fine-grained parallelism in challenging programs. Hardware-compiler co-design techniques can create many tasks in parallel and reduce per-task overheads to make tiny tasks scale to many cores. Third, new hardware and software mechanisms can compose parallelism across entire programs, removing serializing barriers to overlap executions of nested parallel subroutines. Finally, exploiting static and dynamic information for data locality reduces data movement costs while maintaining load balance on large multicore systems. This dissertation presents three systems that embody these four principles. First, T4 introduces automatic program transformations that exploit a novel hardware architecture to parallelize sequential programs. As a result, T4 scales hard-to-parallelize real-world programs to tens of cores, resulting in order-of-magnitude speedups. Second, S5 builds on T4 with novel transformations to remove needless serialization in a broad class of challenging data structures. Thus, S5 scales complex real-world programs to hundreds of cores, delivers additional order-of-magnitude speedups over T4, and outperforms manually parallelized code tuned by experts. Finally, ASH is an accelerator that demonstrates the same approach can be applied with simpler mechanisms tailored for digital circuit simulation. A small ASH implementation is 32x faster than a large multicore CPU running a state-of-the-art parallel simulator.
Date issued
2023-09
URI
https://hdl.handle.net/1721.1/164488
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.