Hardware-level fine-grained thread migration

Lis, Mieszko N. (Mieszko Norbert), 1977-

Author(s)

Lis, Mieszko N. (Mieszko Norbert), 1977-

DownloadFull printable version (10.32Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Srinivas Devadas.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Although thread migration has long been employed to satisfy load-balancing goals in operating systems for symmetric multiprocessing hardware, the high cost of OS-mediated migration has made more fine-grained applications impractical. With only a few cores per processor, and high overheads due to moving threads across processors and loss of cache affinity, assigning threads to specific processor cores for long periods has remained the default strategy for ensuring maximum performance. Massive-scale single-chip multiprocessors dramatically alter this picture. On-chip data transfer latencies-even across a 100+-core chip-rarely exceed tens of cycles, making the potential cost of thread migration as low as executing several instructions. At the same time, all cores are placed on the same die and typically share one last-level cache distributed on chip, obviating cache affinity concerns. In this dissertation, we explore the limits of fine-grained thread migration by developing an autonomous mechanism for migrating threads implemented entirely in hardware. We then employ migration to implement the unified shared memory abstraction without a cache coherence protocol-a particularly demanding application that requires fast and fine-grained thread movement-and show that performance is competitive with traditional shared memory mechanisms. Finally, we describe a real-world implementation of both concepts in a 110-core single-chip multiprocessor in 45nm ASIC technology.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 109-113).

Date issued

2014

URI

http://hdl.handle.net/1721.1/93066

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses