Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications
Author(s)
Golden, Courtney; Feldmann, Axel; Emer, Joel; Sanchez, Daniel
Download3725843.3756035.pdf (1.488Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Iterative sparse matrix computations lie at the heart of many scientific computing and graph analytics algorithms. On conventional systems, their irregular memory accesses and low arithmetic intensity create challenging memory bandwidth bottlenecks. To overcome such bottlenecks, distributed-SRAM architectures are structured as an array of tiles, each with a processing element (PE) and a small local memory, to achieve very high aggregate memory bandwidth. However, current distributed-SRAM architectures suffer from either poor programmability due to over-specialized PEs or poor compute performance due to inefficient general-purpose PEs.
We propose Quartz, a new architecture that uses short dataflow tasks and reconfigurable PEs in a distributed-SRAM system to deliver both high performance and high programmability. Unlike traditional sparse CGRAs or on-die reconfigurable engines, Quartz allows reconfigurable compute to be highly utilized and scaled by (1) providing high memory bandwidth to each processing element and (2) introducing a task-level dataflow execution model that fits this new setting. Our execution model dynamically reconfigures each tile’s PE in response to inter-tile messages to execute tasks on local data. This execution model enables fine-grained data partitioning across tiles. To make execution efficient, we explore novel data partitioning techniques that use graph and hypergraph partitioning to minimize network traffic and balance load in the face of both static-static and static-dynamic operand sparsity. To ensure programmability, we show how a wide range of Einsum-expressible computations and flexible data distributions can be systematically captured in small tasks for execution on Quartz.
Quartz’s architecture, data partitioning techniques, and programming model together achieve gmean 21.4 × speedup over a prior state-of-the-art system for six different iterative sparse applications from scientific computing and graph analytics.
Description
MICRO ’25, Seoul, Republic of Korea
Date issued
2025-10-17Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryPublisher
ACM|58th IEEE/ACM International Symposium on Microarchitecture
Citation
Courtney Golden, Axel Feldmann, Joel Emer, and Daniel Sanchez. 2025.
Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications. In 58th IEEE/ACM International Symposium on Microarchitecture
(MICRO ’25), October 18–22, 2025, Seoul, Republic of Korea. ACM, New York,
NY, USA, 15 pages.
Version: Final published version
ISBN
979-8-4007-1573-0