Brief announcement: Distributed shared memory based on computation migration

Lis, Mieszko; Shim, Keun Sup; Cho, Myong Hyon; Fletcher, Christopher W.; Kinsy, Michel; Lebedev, Ilia; Khan, Omer; Devadas, Srinivas

Author(s)

Lis, Mieszko; Shim, Keun Sup; Cho, Myong Hyon; Fletcher, Christopher Wardlaw; Kinsy, Michel A.; ... Show more

DownloadDevadas-Brief announcement.pdf (176.1Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7].

Date issued

2011-06

URI

http://hdl.handle.net/1721.1/72358

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11)

Publisher

Association for Computing Machinery (ACM)

Citation

Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Christopher W. Fletcher, Michel Kinsy, Ilia Lebedev, Omer Khan, and Srinivas Devadas. 2011. Brief announcement: distributed shared memory based on computation migration. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, 253-256.

Version: Author's final manuscript

ISBN

978-1-4503-0743-7

Collections

MIT Open Access Articles

DSpace@MIT