Transitive Array: An Efficient GEMM Accelerator with Result Reuse

Guo, Cong; Wei, Chiyue; Tang, Jiaming; Duan, Bowen; Han, Song; Li, Hai; Chen, Yiran

dc.contributor.author	Guo, Cong
dc.contributor.author	Wei, Chiyue
dc.contributor.author	Tang, Jiaming
dc.contributor.author	Duan, Bowen
dc.contributor.author	Han, Song
dc.contributor.author	Li, Hai
dc.contributor.author	Chen, Yiran
dc.date.accessioned	2025-09-16T19:59:10Z
dc.date.available	2025-09-16T19:59:10Z
dc.date.issued	2025-06-20
dc.identifier.isbn	979-8-4007-1261-6
dc.identifier.uri	https://hdl.handle.net/1721.1/162666
dc.description	ISCA ’25, Tokyo, Japan	en_US
dc.description.abstract	Deep Neural Networks (DNNs) and Large Language Models (LLMs) have revolutionized artificial intelligence, yet their deployment faces significant memory and computational challenges, especially in resource-constrained environments. Quantization techniques have mitigated some of these issues by reducing data precision, primarily focusing on General Matrix Multiplication (GEMM). This study introduces a novel sparsity paradigm, transitive sparsity, which leverages the reuse of previously computed results to substantially minimize computational overhead in GEMM operations. By representing transitive relations using a directed acyclic graph, we develop an efficient strategy for determining optimal execution orders, thereby overcoming inherent challenges related to execution dependencies and parallelism. Building on this foundation, we present the Transitive Array, a multiplication-free accelerator designed to exploit transitive sparsity in GEMM. Our architecture effectively balances computational workloads across multiple parallel lanes, ensuring high efficiency and optimal resource utilization. Comprehensive evaluations demonstrate that the Transitive Array achieves approximately 7.46 × and 3.97 × speedup and 2.31 × and 1.65 × energy reduction compared to state-of-the-art accelerators such as Olive and BitVert while maintaining comparable model accuracy on LLaMA models.	en_US
dc.publisher	ACM\|Proceedings of the 52nd Annual International Symposium on Computer Architecture	en_US
dc.relation.isversionof	https://doi.org/10.1145/3695053.3731043	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Transitive Array: An Efficient GEMM Accelerator with Result Reuse	en_US
dc.type	Article	en_US
dc.identifier.citation	Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai Li, and Yiran Chen. 2025. Transitive Array: An Efficient GEMM Accelerator with Result Reuse. In Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25). Association for Computing Machinery, New York, NY, USA, 990–1004.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-08-01T07:56:38Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-08-01T07:56:38Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3695053.3731043.pdf
Size:: 1.229Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record