Spatial Accelerator Generation and Optimization for Tensor Applications
Author(s)
Zhang, Zhekai
DownloadThesis PDF (1.410Mb)
Advisor
Han, Song
Terms of use
Metadata
Show full item recordAbstract
Modern foundation models and generative AI applications require multiple input modalities (both vision and language), which increases the demand for flexible accelerator architecture.
Existing frameworks suffer from the trade-off between design flexibility and productivity of RTL generation: either limited to very few hand-written templates or cannot automatically generate the RTL.
To address this challenge, we propose the LEGO framework, which automatically generates and optimizes spatial architecture design in the front end and outputs synthesizable RTL code in the back end without RTL templates. LEGO front end finds all possible interconnections between function units and determines the memory system shape by solving the integer linear equations, and establishes the connections by a minimum-spanning-tree-based algorithm and a breadth-first-search-based heuristic algorithm for merging different spatial dataflow designs. LEGO back end then translates the hardware in a primitive-level graph to perform lower-level optimizations, and applies a set of linear-programming
algorithms to optimally insert pipeline registers and reduce the overhead of unused logic when switching spatial dataflows.
Our evaluation demonstrates that LEGO can achieve 3.2× speedup and 2.4× energy efficiency compared to previous work Gemmini, and can generate one architecture for diverse modern foundation models in generative AI applications.
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology