Performance engineering of multicore software : developing a science of fast code for the post-Moore era
Author(s)
Schardl, Tao Benjamin
DownloadFull printable version (6.289Mb)
Alternative title
Developing a science of fast code for the post-Moore era
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Charles E. Leiserson.
Terms of use
Metadata
Show full item recordAbstract
The end of Moore's Law, which experts predict to occur in as few as 5 years, means that even average programmers will need to be able to write fast code. Software performance engineering offers great promise to provide computer performance gains in the post-Moore era, but developing efficient software today requires substantial expertise and arcane knowledge of hardware and software systems. Multicore processors are particularly challenging to use efficiently, because doing so requires programmers to engage in parallel programming and to deal with nondeterministic program behavior and parallel scalability concerns. I contend that we can remedy the ad hoc and unprincipled nature of software performance engineering by creating simple and integrated programming technologies for writing fast code. This thesis studies how such technologies can be built by examining nine artifacts that enable principled approaches to tackling nondeterminism and scalability concerns in writing efficient multicore software. Five artifacts develop programming models and theories of performance for writing multicore programs that are efficient both in theory and in practice: - PBFS, a work-efficient parallel breadth-first search algorithm. - The Prism chromatic-scheduling algorithm, which executes dynamic data-graph computations deterministically in parallel. - Ordering heuristics for parallel greedy graph coloring algorithms. - The pedigree mechanism and DotMix algorithm for generating pseudorandom numbers deterministically in parallel in dynamic multithreaded programs. - The Cilk-P concurrency platform, which provides linguistic and runtime support for deterministic on-the-fly pipeline parallelism. Three artifacts strive to embed abstract programming and performance models into tools and compilers: - Cilkprof, a profiler that efficiently measures how each call site in a Cilk program contributes to the program's scalability. - Rader, a provably good race detector for Cilk programs that use reducer hyperobjects. - The Tapir compiler intermediate representation, which enables existing compiler optimizations for serial code to optimize across parallel control flow with minimal changes. The final artifact tackles the complexity of creating efficient diagnostic tools: - CSI, a framework that provides comprehensive static instrumentation for efficient dynamic-analysis tools. Together, these artifacts contribute to developing a more coherent science of fast code for multicores than exists today.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 303-328).
Date issued
2016Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.