SpArch: Efficient Architecture for Sparse Matrix Multiplication

Zhang, Zhekai; Wang, Hanrui; Han, Song

dc.contributor.author	Zhang, Zhekai
dc.contributor.author	Wang, Hanrui
dc.contributor.author	Han, Song
dc.date.accessioned	2021-01-19T14:03:23Z
dc.date.available	2021-01-19T14:03:23Z
dc.date.issued	2020-02
dc.identifier.isbn	9781728161501
dc.identifier.uri	https://hdl.handle.net/1721.1/129436
dc.description.abstract	Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various engineering and scientific applications. However, inner product based SpGEMM introduces redundant input fetches for mismatched nonzero operands, while outer product based approach suffers from poor output locality due to numerous partial product matrices. Inefficiency in the reuse of either inputs or outputs data leads to extensive and expensive DRAM access. To address this problem, this paper proposes an efficient sparse matrix multiplication accelerator architecture, SpArch, which jointly optimizes the data locality for both input and output matrices. We first design a highly parallelized streaming-based merger to pipeline the multiply and merge stage of partial matrices so that partial matrices are merged on chip immediately after produced. We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5.4x. We further develop a Huffman tree scheduler to improve the scalability of the merger for larger sparse matrices, which reduces the DRAM access by another 1.8x. We also resolve the increased input matrix read induced by the new representation using a row prefetcher with near-optimal buffer replacement policy, further reducing the DRAM access by 1.5x. Evaluated on 20 benchmarks, SpArch reduces the total DRAM access by 2.8x over previous state-of-the-art. On average, SpArch achieves 4x, 19x, 18x, 17x, 1285x speedup and 6x, 164x, 435x, 307x, 62x energy savings over OuterSpace, MKL, cuSPARSE, CUSP, and ARM Armadillo, respectively.	en_US
dc.description.sponsorship	National Science Foundation (U.S.). Harnessing the Data Revolution (Award 1934700)	en_US
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	10.1109/HPCA47549.2020.00030	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	SpArch: Efficient Architecture for Sparse Matrix Multiplication	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhang, Zhekai et al. “SpArch: Efficient Architecture for Sparse Matrix Multiplication.” Paper presented at the 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, San Diego, CA, February 22-26, 2020, IEEE © 2019 The Author(s)	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.relation.journal	Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2020-12-17T17:09:39Z
dspace.orderedauthors	Zhang, Z; Wang, H; Han, S; Dally, WJ	en_US
dspace.date.submission	2020-12-17T17:09:41Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Complete

Files in this item

Name:: 2002.08947.pdf
Size:: 3.124Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record