SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Wang, Hanrui; Zhang, Zhekai; Han, Song

dc.contributor.author	Wang, Hanrui
dc.contributor.author	Zhang, Zhekai
dc.contributor.author	Han, Song
dc.date.accessioned	2022-07-12T14:08:01Z
dc.date.available	2022-07-12T14:08:01Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/1721.1/143674
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	10.1109/HPCA51647.2021.00018	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning	en_US
dc.type	Article	en_US
dc.identifier.citation	Wang, Hanrui, Zhang, Zhekai and Han, Song. 2021. "SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning." 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journal	2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2022-07-12T13:54:17Z
dspace.orderedauthors	Wang, H; Zhang, Z; Han, S	en_US
dspace.date.submission	2022-07-12T13:54:19Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 2012.09852.pdf
Size:: 2.525Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record