Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device

Zhang, Niansong; Zhu, Wenbo; Golden, Courtney; Ilan, Dan; Chen, Hongzheng; Batten, Christopher; Zhang, Zhiru

dc.contributor.author	Zhang, Niansong
dc.contributor.author	Zhu, Wenbo
dc.contributor.author	Golden, Courtney
dc.contributor.author	Ilan, Dan
dc.contributor.author	Chen, Hongzheng
dc.contributor.author	Batten, Christopher
dc.contributor.author	Zhang, Zhiru
dc.date.accessioned	2025-12-03T15:43:06Z
dc.date.available	2025-12-03T15:43:06Z
dc.date.issued	2025-10-17
dc.identifier.isbn	979-8-4007-1573-0
dc.identifier.uri	https://hdl.handle.net/1721.1/164117
dc.description	MICRO ’25, Seoul, Republic of Korea	en_US
dc.description.abstract	Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. However, prior evaluations have largely relied on simulators or small prototypes, limiting the understanding of their real-world potential. In this work, we present a comprehensive performance and energy characterization of a commercial compute-in-SRAM device, the GSI APU, under realistic workloads. We compare the GSI APU against established architectures, including CPUs and GPUs, to quantify its energy efficiency and performance potential. We introduce an analytical framework for general-purpose compute-in-SRAM devices that reveals fundamental optimization principles by modeling performance trade-offs, thereby guiding program optimizations. Exploiting the fine-grained parallelism of tightly integrated memory-compute architectures requires careful data management. We address this by proposing three optimizations: communicationaware reduction mapping, coalesced DMA, and broadcast-friendly data layouts. When applied to retrieval-augmented generation (RAG) over large corpora (10GB–200GB), these optimizations enable our compute-in-SRAM system to accelerate retrieval by 4.8×–6.6× over an optimized CPU baseline, improving end-to-end RAG latency by 1.1×–1.8×. The shared off-chip memory bandwidth is modeled using a simulated HBM, while all other components are measured on the real compute-in-SRAM device. Critically, this system matches the performance of an NVIDIA A6000 GPU for RAG while being significantly more energy-efficient (54.4×-117.9× reduction). These findings validate the viability of compute-in-SRAM for complex, real-world applications and provide guidance for advancing the technology.	en_US
dc.publisher	ACM\|58th IEEE/ACM International Symposium on Microarchitecture	en_US
dc.relation.isversionof	https://doi.org/10.1145/3725843.3756132	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device	en_US
dc.type	Article	en_US
dc.identifier.citation	Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, and Zhiru Zhang. 2025. Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device. In Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture (MICRO '25). Association for Computing Machinery, New York, NY, USA, 1011–1025.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-11-01T07:49:42Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-11-01T07:49:43Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3725843.3756132.pdf
Size:: 2.171Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record