Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications

Golden, Courtney; Feldmann, Axel; Emer, Joel; Sanchez, Daniel

dc.contributor.author	Golden, Courtney
dc.contributor.author	Feldmann, Axel
dc.contributor.author	Emer, Joel
dc.contributor.author	Sanchez, Daniel
dc.date.accessioned	2025-11-26T16:37:49Z
dc.date.available	2025-11-26T16:37:49Z
dc.date.issued	2025-10-17
dc.identifier.isbn	979-8-4007-1573-0
dc.identifier.uri	https://hdl.handle.net/1721.1/164073
dc.description	MICRO ’25, Seoul, Republic of Korea	en_US
dc.description.abstract	Iterative sparse matrix computations lie at the heart of many scientific computing and graph analytics algorithms. On conventional systems, their irregular memory accesses and low arithmetic intensity create challenging memory bandwidth bottlenecks. To overcome such bottlenecks, distributed-SRAM architectures are structured as an array of tiles, each with a processing element (PE) and a small local memory, to achieve very high aggregate memory bandwidth. However, current distributed-SRAM architectures suffer from either poor programmability due to over-specialized PEs or poor compute performance due to inefficient general-purpose PEs. We propose Quartz, a new architecture that uses short dataflow tasks and reconfigurable PEs in a distributed-SRAM system to deliver both high performance and high programmability. Unlike traditional sparse CGRAs or on-die reconfigurable engines, Quartz allows reconfigurable compute to be highly utilized and scaled by (1) providing high memory bandwidth to each processing element and (2) introducing a task-level dataflow execution model that fits this new setting. Our execution model dynamically reconfigures each tile’s PE in response to inter-tile messages to execute tasks on local data. This execution model enables fine-grained data partitioning across tiles. To make execution efficient, we explore novel data partitioning techniques that use graph and hypergraph partitioning to minimize network traffic and balance load in the face of both static-static and static-dynamic operand sparsity. To ensure programmability, we show how a wide range of Einsum-expressible computations and flexible data distributions can be systematically captured in small tasks for execution on Quartz. Quartz’s architecture, data partitioning techniques, and programming model together achieve gmean 21.4 × speedup over a prior state-of-the-art system for six different iterative sparse applications from scientific computing and graph analytics.	en_US
dc.publisher	ACM\|58th IEEE/ACM International Symposium on Microarchitecture	en_US
dc.relation.isversionof	https://doi.org/10.1145/3725843.3756035	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications	en_US
dc.type	Article	en_US
dc.identifier.citation	Courtney Golden, Axel Feldmann, Joel Emer, and Daniel Sanchez. 2025. Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications. In 58th IEEE/ACM International Symposium on Microarchitecture (MICRO ’25), October 18–22, 2025, Seoul, Republic of Korea. ACM, New York, NY, USA, 15 pages.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-11-01T07:49:01Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-11-01T07:49:02Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3725843.3756035.pdf
Size:: 1.488Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record