Constructing and evaluating weak memory models

Zhang, Sizhuo.

Author(s)

Zhang, Sizhuo.

Download1124763012-MIT.pdf (6.316Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Arvind.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

A memory model for an instruction set architecture (ISA) specifies all the legal multithreaded-program behaviors, and consequently constrains processor implementations. Weak memory models are a consequence of the desire of architects to preserve the flexibility of implementing optimizations that are used in uniprocessors, while building a shared-memory multiprocessor. Commercial weak memory models like ARM and POWER are extremely complicated: it has taken over a decade to formalize their definitions. These formalization efforts are mostly empirical--they try to capture empirically observed behaviors in commercial processors--and do not provide any insights into the reasons for the complications in weak-memory-model definitions. This thesis takes a constructive approach to study weak memory models. We first construct a base model for weak memory models by considering how a multiprocessor is formed by connecting uniprocessors to a shared memory system.

We try to minimize the constraints in the base model as long as the model enforces single-threaded correctness and matches the common assumptions made in multithreaded programs. With the base model, we can show not only the differences among different weak memory models, but also the implications of these differences, e.g., more definitional complexity or more implementation flexibility or failures to match programming assumptions. The construction of the base model also reveals that allowing load-store reordering (i.e., a younger store is executed before an older load) is the source of definitional complexity of weak memory models. We construct a new weak memory model WMM that disallows load-store reordering, and consequently, has a much simpler definition. We show that WMM has almost the same performance as existing weak memory models.

To evaluate the performance/power/area (PPA) of weak memory models versus that of strong memory models like TSO, we build an out-of-order superscalar cachecoherent multiprocessor. Our evaluation considers out-of-order multiprocessors of small sizes and benchmark programs written using portable multithreaded libraries and compiler built-ins. We find that the PPA of an optimized TSO implementation can match the PPA of implementations of weak memory models. These results provide a key insight that load execution in TSO processors can be as aggressive as, or even more aggressive than, that in weak-memory-model processors. Based on this insight, we further conjecture that weak memory models cannot provide better performance than TSO in case of high-performance out-of-order processors. However, whether weak memory models have advantages over TSO in case of energy-efficient in-order processors or embedded microcontrollers remains an open question.

Description

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 211-224).

Date issued

2019

URI

https://hdl.handle.net/1721.1/122690

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses