Compiler-Hardware Co-Design for Pervasive Parallelization

Ying, Victor A.

dc.contributor.advisor	Sanchez, Daniel
dc.contributor.author	Ying, Victor A.
dc.date.accessioned	2026-01-12T19:40:00Z
dc.date.available	2026-01-12T19:40:00Z
dc.date.issued	2023-09
dc.date.submitted	2023-09-21T14:26:39.410Z
dc.identifier.uri	https://hdl.handle.net/1721.1/164488
dc.description.abstract	Modern computer systems have hundreds of processor cores, so highly parallel programs are critical to achieve high performance. But parallel programming remains difficult on current systems, so many programs are still sequential. This dissertation presents new compilers and hardware architectures that can parallelize complex programs while retaining the simplicity of sequential code. Our new systems allow real-world programs to use hundreds of cores without burdening programmers with concurrency, deadlock, or data races. This dissertation follows a novel approach that eliminates the burden of explicit parallel programming to make parallel execution pervasive. This approach relies on four guiding principles. First, exploiting implicit parallelism preserves the simplicity of sequential execution. Second, dividing computation into tiny tasks, as short as tens of instructions each, unlocks plentiful fine-grained parallelism in challenging programs. Hardware-compiler co-design techniques can create many tasks in parallel and reduce per-task overheads to make tiny tasks scale to many cores. Third, new hardware and software mechanisms can compose parallelism across entire programs, removing serializing barriers to overlap executions of nested parallel subroutines. Finally, exploiting static and dynamic information for data locality reduces data movement costs while maintaining load balance on large multicore systems. This dissertation presents three systems that embody these four principles. First, T4 introduces automatic program transformations that exploit a novel hardware architecture to parallelize sequential programs. As a result, T4 scales hard-to-parallelize real-world programs to tens of cores, resulting in order-of-magnitude speedups. Second, S5 builds on T4 with novel transformations to remove needless serialization in a broad class of challenging data structures. Thus, S5 scales complex real-world programs to hundreds of cores, delivers additional order-of-magnitude speedups over T4, and outperforms manually parallelized code tuned by experts. Finally, ASH is an accelerator that demonstrates the same approach can be applied with simpler mechanisms tailored for digital circuit simulation. A small ASH implementation is 32x faster than a large multicore CPU running a state-of-the-art parallel simulator.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Compiler-Hardware Co-Design for Pervasive Parallelization
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	https://orcid.org/0000-0001-9660-7082
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: ying-victory-phd-eecs-2023-the ...
Size:: 2.516Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record