MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Hardware-level fine-grained thread migration

Author(s)
Lis, Mieszko N. (Mieszko Norbert), 1977-
Thumbnail
DownloadFull printable version (10.32Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Srinivas Devadas.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Although thread migration has long been employed to satisfy load-balancing goals in operating systems for symmetric multiprocessing hardware, the high cost of OS-mediated migration has made more fine-grained applications impractical. With only a few cores per processor, and high overheads due to moving threads across processors and loss of cache affinity, assigning threads to specific processor cores for long periods has remained the default strategy for ensuring maximum performance. Massive-scale single-chip multiprocessors dramatically alter this picture. On-chip data transfer latencies-even across a 100+-core chip-rarely exceed tens of cycles, making the potential cost of thread migration as low as executing several instructions. At the same time, all cores are placed on the same die and typically share one last-level cache distributed on chip, obviating cache affinity concerns. In this dissertation, we explore the limits of fine-grained thread migration by developing an autonomous mechanism for migrating threads implemented entirely in hardware. We then employ migration to implement the unified shared memory abstraction without a cache coherence protocol-a particularly demanding application that requires fast and fine-grained thread movement-and show that performance is competitive with traditional shared memory mechanisms. Finally, we describe a real-world implementation of both concepts in a 110-core single-chip multiprocessor in 45nm ASIC technology.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages 109-113).
 
Date issued
2014
URI
http://hdl.handle.net/1721.1/93066
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.