Repository logo
Log in(current)
Repository logoMIT Open ScholarshipDSpace@MIT
  1. Home
  2. MIT Libraries
  3. MIT Theses
  4. Graduate Theses
  5. A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications

A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications

Thumbnail Image
Name

golden-cgolden-sm-eecs-2025-thesis.pdf

Description
Thesis PDF
Size

930.93 KB

Format

Adobe PDF

Checksum (MD5)

cb37dc4ac6911de1788422094f6deacc

Author(s)
Golden, Courtney K.
Advisor(s)
Sanchez, Daniel
Emer, Joel S.
Date Issued
May 2025
Publisher
Massachusetts Institute of Technology
Abstract
Iterative sparse matrix computations lie at the heart of many scientific computing and graph analytics algorithms. On conventional systems, their irregular memory accesses and low arithmetic intensity create challenging memory bandwidth bottlenecks. To overcome such bottlenecks, distributed-SRAM architectures use tiled arrays of high-bandwidth local storage to achieve very high aggregate memory bandwidth. However, current distributedSRAM architectures suffer from either poor programmability due to over-specialization or poor compute performance due to inefficient general-purpose hardware. This thesis proposes Quartz, a new architecture that uses short dataflow tasks and reconfigurable compute in a distributed-SRAM system to deliver both high performance and high programmability. Unlike traditional sparse CGRAs or on-die reconfigurable engines, Quartz allows reconfigurable compute to be highly utilized and scaled by (1) providing high memory bandwidth to each processing element and (2) introducing a task-level dataflow execution model that fits this new setting. Our execution model dynamically reconfigures tile hardware based on inter-tile messages to execute tasks on local data with fine-grained data partitioning across tiles. To make execution efficient, we explore novel data partitioning techniques that use graph and hypergraph partitioning to minimize network traffic and balance load. This is especially challenging for computations where one operand’s sparsity pattern (i.e., distribution of nonzeros) exhibits dynamic behavior across iterations, and we are the first to provide techniques to address this case. To ensure programmability, we show how a wide range of computations (expressed in an extended version of tensor algebra’s Einsum notation) and flexible data distributions can be systematically captured in small tasks for execution on Quartz. We evaluate Quartz in simulation, using an 8-chiplet design with 2,048 tiles and 824 MB of SRAM per chiplet, running six different iterative sparse applications from scientific computing and graph analytics. Quartz’s architecture, data partitioning techniques, and programming model together achieve gmean 26.2× speedup over the prior state-of-the-art programmable distributed-SRAM architecture.
MIT Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Terms of Use
In Copyright - Educational Use Permitted
https://rightsstatements.org/page/InC-EDU/1.0/
Copyright retained by author(s)
Persistent DSpace Link
https://hdl.handle.net/1721.1/163707
Repository logo
PrivacyPermissionsAccessibilityContact us
Repository logo
Notify us about copyright concerns.