SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Author(s)
Wang, Hanrui; Zhang, Zhekai; Han, Song
DownloadAccepted version (2.525Mb)
Open Access Policy
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordDate issued
2021Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Wang, Hanrui, Zhang, Zhekai and Han, Song. 2021. "SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning." 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
Version: Author's final manuscript