Constructing and evaluating weak memory models
Author(s)
Zhang, Sizhuo.
Download1124763012-MIT.pdf (6.316Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Arvind.
Terms of use
Metadata
Show full item recordAbstract
A memory model for an instruction set architecture (ISA) specifies all the legal multithreaded-program behaviors, and consequently constrains processor implementations. Weak memory models are a consequence of the desire of architects to preserve the flexibility of implementing optimizations that are used in uniprocessors, while building a shared-memory multiprocessor. Commercial weak memory models like ARM and POWER are extremely complicated: it has taken over a decade to formalize their definitions. These formalization efforts are mostly empirical--they try to capture empirically observed behaviors in commercial processors--and do not provide any insights into the reasons for the complications in weak-memory-model definitions. This thesis takes a constructive approach to study weak memory models. We first construct a base model for weak memory models by considering how a multiprocessor is formed by connecting uniprocessors to a shared memory system. We try to minimize the constraints in the base model as long as the model enforces single-threaded correctness and matches the common assumptions made in multithreaded programs. With the base model, we can show not only the differences among different weak memory models, but also the implications of these differences, e.g., more definitional complexity or more implementation flexibility or failures to match programming assumptions. The construction of the base model also reveals that allowing load-store reordering (i.e., a younger store is executed before an older load) is the source of definitional complexity of weak memory models. We construct a new weak memory model WMM that disallows load-store reordering, and consequently, has a much simpler definition. We show that WMM has almost the same performance as existing weak memory models. To evaluate the performance/power/area (PPA) of weak memory models versus that of strong memory models like TSO, we build an out-of-order superscalar cachecoherent multiprocessor. Our evaluation considers out-of-order multiprocessors of small sizes and benchmark programs written using portable multithreaded libraries and compiler built-ins. We find that the PPA of an optimized TSO implementation can match the PPA of implementations of weak memory models. These results provide a key insight that load execution in TSO processors can be as aggressive as, or even more aggressive than, that in weak-memory-model processors. Based on this insight, we further conjecture that weak memory models cannot provide better performance than TSO in case of high-performance out-of-order processors. However, whether weak memory models have advantages over TSO in case of energy-efficient in-order processors or embedded microcontrollers remains an open question.
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 211-224).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.