Show simple item record

dc.contributor.authorShim, Keun Sup
dc.contributor.authorLis, Mieszko
dc.contributor.authorCho, Myong Hyon
dc.contributor.authorDevadas, Srinivas
dc.contributor.authorLebedev, Ilia A.
dc.date.accessioned2014-04-14T18:39:06Z
dc.date.available2014-04-14T18:39:06Z
dc.date.issued2013-10
dc.identifier.isbn978-1-4799-2987-0
dc.identifier.urihttp://hdl.handle.net/1721.1/86166
dc.description.abstractAs transistor technology continues to scale, the architecture community has experienced exponential growth in design complexity and significantly increasing implementation and verification costs. Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip. Often, these large-core-count chips provide a shared memory abstraction via directories and coherence protocols, which have become notoriously error-prone and difficult to verify because of subtle data races and state space explosion. Although a very simple hardware shared memory implementation can be achieved by simply not allowing ad-hoc data replication and relying on remote accesses for remotely cached data (i.e., requiring no directories or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy. Our recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM[superscript 2]), establishes a new design point. On the one hand, EM[superscript 2] supports shared memory but does not automatically replicate data, and thus preserves the simplicity of directoryless architectures. On the other hand, it significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration. In this paper, we describe the design choices made in the EM[superscript 2] chip as well as our choice of design methodology, and discuss how they combine to achieve design simplicity and verification efficiency. Even though EM[superscript 2] is a fairly large design-110 cores using a total of 357 million transistors-the entire chip design and implementation process (RTL, verification, physical design, tapeout) took only 18 man-months.en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/ICCD.2013.6657037en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleDesign tradeoffs for simplicity and efficient verification in the Execution Migration Machineen_US
dc.typeArticleen_US
dc.identifier.citationShim, Keun Sup, Mieszko Lis, Myong Hyon Cho, Ilia Lebedev, and Srinivas Devadas. “Design Tradeoffs for Simplicity and Efficient Verification in the Execution Migration Machine.” 2013 IEEE 31st International Conference on Computer Design (ICCD) (n.d.).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorShim, Keun Supen_US
dc.contributor.mitauthorLis, Mieszkoen_US
dc.contributor.mitauthorCho, Myong Hyonen_US
dc.contributor.mitauthorLebedev, Ilia A.en_US
dc.contributor.mitauthorDevadas, Srinivasen_US
dc.relation.journalProceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD)en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsShim, Keun Sup; Lis, Mieszko; Cho, Myong Hyon; Lebedev, Ilia; Devadas, Srinivasen_US
dc.identifier.orcidhttps://orcid.org/0000-0001-8253-7714
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record