Brief announcement: Distributed shared memory based on computation migration
Author(s)
Lis, Mieszko; Shim, Keun Sup; Cho, Myong Hyon; Fletcher, Christopher Wardlaw; Kinsy, Michel A.; Lebedev, Ilia A.; Khan, Omer; Devadas, Srinivas; ... Show more Show less
DownloadDevadas-Brief announcement.pdf (176.1Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Driven by increasingly unbalanced technology scaling and power
dissipation limits, microprocessor designers have resorted to increasing
the number of cores on a single chip, and pundits expect
1000-core designs to materialize in the next few years [1]. But how
will memory architectures scale and how will these next-generation
multicores be programmed?
One barrier to scaling current memory architectures is the offchip
memory bandwidth wall [1,2]: off-chip bandwidth grows with
package pin density, which scales much more slowly than on-die
transistor density [3]. To reduce reliance on external memories and
keep data on-chip, today’s multicores integrate very large shared
last-level caches on chip [4]; interconnects used with such shared
caches, however, do not scale beyond relatively few cores, and the
power requirements and access latencies of large caches exclude
their use in chips on a 1000-core scale. For massive-scale multicores,
then, we are left with relatively small per-core caches.
Per-core caches on a 1000-core scale, in turn, raise the question
of memory coherence. On the one hand, a shared memory abstraction
is a practical necessity for general-purpose programming, and
most programmers prefer a shared memory model [5]. On the other
hand, ensuring coherence among private caches is an expensive
proposition: bus-based and snoopy protocols don’t scale beyond
relatively few cores, and directory sizes needed in cache-coherence
protocols must equal a significant portion of the combined size of
the per-core caches as otherwise directory evictions will limit performance
[6]. Moreover, directory-based coherence protocols are
notoriously difficult to implement and verify [7].
Date issued
2011-06Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11)
Publisher
Association for Computing Machinery (ACM)
Citation
Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Christopher W. Fletcher, Michel Kinsy, Ilia Lebedev, Omer Khan, and Srinivas Devadas. 2011. Brief announcement: distributed shared memory based on computation migration. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, 253-256.
Version: Author's final manuscript
ISBN
978-1-4503-0743-7