MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing

Author(s)
Pellauer, Michael; Clemons, Jason; Balaji, Vignesh; Crago, Neal; Jaleel, Aamer; Lee, Donghyuk; O'Connor, Mike; Parashar, Angshuman; Treichler, Sean; Tsai, Po-An; Keckler, Stephen; Emer, Joel; ... Show more Show less
Thumbnail
Download3630007.pdf (868.0Kb)
Publisher Policy

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
Sparse tensor algorithms are becoming widespread, particularly in the domains of deep learning, graph and data analytics, and scientific computing. Current high-performance broad-domain architectures, such as GPUs, often suffer memory system inefficiencies by moving too much data or moving it too far through the memory hierarchy. To increase performance and efficiency, proposed domain-specific accelerators tailor their architectures to the data needs of a narrow application domain, but as a result cannot be applied to a wide range of algorithms or applications that contain a mix of sparse and dense algorithms. This paper proposes Symphony, a hybrid programmable/specialized architecture which focuses on the orchestration of data throughout the memory hierarchy to simultaneously reduce the movement of unnecessary data and data movement distances. Key elements of the Symphony architecture include (1) specialized reconfigurable units aimed not only at roofline floating-point computations, but at supporting data orchestration features such as address generation, data filtering, and sparse metadata processing; and (2) distribution of computation resources (both programmable and specialized) throughout the on-chip memory hierarchy. We demonstrate that Symphony can match non-programmable ASIC performance on sparse tensor algebra, and provide 31× improved runtime and 44× improved energy over a comparably provisioned GPU for these applications.
URI
https://hdl.handle.net/1721.1/152619
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Journal
ACM Transactions on Computer Systems
Publisher
ACM
Citation
Pellauer, Michael, Clemons, Jason, Balaji, Vignesh, Crago, Neal, Jaleel, Aamer et al. "Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing." ACM Transactions on Computer Systems.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.