Show simple item record

dc.contributor.authorGolden, Courtney
dc.contributor.authorFeldmann, Axel
dc.contributor.authorEmer, Joel
dc.contributor.authorSanchez, Daniel
dc.date.accessioned2025-11-26T16:37:49Z
dc.date.available2025-11-26T16:37:49Z
dc.date.issued2025-10-17
dc.identifier.isbn979-8-4007-1573-0
dc.identifier.urihttps://hdl.handle.net/1721.1/164073
dc.descriptionMICRO ’25, Seoul, Republic of Koreaen_US
dc.description.abstractIterative sparse matrix computations lie at the heart of many scientific computing and graph analytics algorithms. On conventional systems, their irregular memory accesses and low arithmetic intensity create challenging memory bandwidth bottlenecks. To overcome such bottlenecks, distributed-SRAM architectures are structured as an array of tiles, each with a processing element (PE) and a small local memory, to achieve very high aggregate memory bandwidth. However, current distributed-SRAM architectures suffer from either poor programmability due to over-specialized PEs or poor compute performance due to inefficient general-purpose PEs. We propose Quartz, a new architecture that uses short dataflow tasks and reconfigurable PEs in a distributed-SRAM system to deliver both high performance and high programmability. Unlike traditional sparse CGRAs or on-die reconfigurable engines, Quartz allows reconfigurable compute to be highly utilized and scaled by (1) providing high memory bandwidth to each processing element and (2) introducing a task-level dataflow execution model that fits this new setting. Our execution model dynamically reconfigures each tile’s PE in response to inter-tile messages to execute tasks on local data. This execution model enables fine-grained data partitioning across tiles. To make execution efficient, we explore novel data partitioning techniques that use graph and hypergraph partitioning to minimize network traffic and balance load in the face of both static-static and static-dynamic operand sparsity. To ensure programmability, we show how a wide range of Einsum-expressible computations and flexible data distributions can be systematically captured in small tasks for execution on Quartz. Quartz’s architecture, data partitioning techniques, and programming model together achieve gmean 21.4 × speedup over a prior state-of-the-art system for six different iterative sparse applications from scientific computing and graph analytics.en_US
dc.publisherACM|58th IEEE/ACM International Symposium on Microarchitectureen_US
dc.relation.isversionofhttps://doi.org/10.1145/3725843.3756035en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleQuartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applicationsen_US
dc.typeArticleen_US
dc.identifier.citationCourtney Golden, Axel Feldmann, Joel Emer, and Daniel Sanchez. 2025. Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications. In 58th IEEE/ACM International Symposium on Microarchitecture (MICRO ’25), October 18–22, 2025, Seoul, Republic of Korea. ACM, New York, NY, USA, 15 pages.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-11-01T07:49:01Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-11-01T07:49:02Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record