Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing

Pellauer, Michael; Clemons, Jason; Balaji, Vignesh; Crago, Neal; Jaleel, Aamer; Lee, Donghyuk; O'Connor, Mike; Parashar, Angshuman; Treichler, Sean; Tsai, Po-An; Keckler, Stephen; Emer, Joel

Author(s)

Pellauer, Michael; Clemons, Jason; Balaji, Vignesh; Crago, Neal; Jaleel, Aamer; ... Show more

Download3630007.pdf (868.0Kb)

Publisher Policy

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

Sparse tensor algorithms are becoming widespread, particularly in the domains of deep learning, graph and data analytics, and scientific computing. Current high-performance broad-domain architectures, such as GPUs, often suffer memory system inefficiencies by moving too much data or moving it too far through the memory hierarchy. To increase performance and efficiency, proposed domain-specific accelerators tailor their architectures to the data needs of a narrow application domain, but as a result cannot be applied to a wide range of algorithms or applications that contain a mix of sparse and dense algorithms. This paper proposes Symphony, a hybrid programmable/specialized architecture which focuses on the orchestration of data throughout the memory hierarchy to simultaneously reduce the movement of unnecessary data and data movement distances. Key elements of the Symphony architecture include (1) specialized reconfigurable units aimed not only at roofline floating-point computations, but at supporting data orchestration features such as address generation, data filtering, and sparse metadata processing; and (2) distribution of computation resources (both programmable and specialized) throughout the on-chip memory hierarchy. We demonstrate that Symphony can match non-programmable ASIC performance on sparse tensor algebra, and provide 31× improved runtime and 44× improved energy over a comparably provisioned GPU for these applications.

URI

https://hdl.handle.net/1721.1/152619

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

ACM Transactions on Computer Systems

Publisher

ACM

Citation

Pellauer, Michael, Clemons, Jason, Balaji, Vignesh, Crago, Neal, Jaleel, Aamer et al. "Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing." ACM Transactions on Computer Systems.

Version: Final published version

Collections

MIT Open Access Articles