MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

LoopTree: Enabling Systematic and Flexible Exploration of Fused-layer Dataflow Accelerators

Author(s)
Gilbert, Michael
Thumbnail
DownloadThesis PDF (2.022Mb)
Advisor
Sze, Vivienne
Emer, Joel
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Deep neural network (DNN) accelerators exploit data reuse to reduce memory traffic. Typically, DNN accelerators exploit data reuse within layers. However, there is also reuse between layers. To exploit this inter-layer reuse opportunity, fused-layer dataflow accelerators tile and buffer intermediate data between layers on-chip to benefit from inter-layer reuse while minimizing buffer size. To further minimize buffer space requirement, some fused-layer dataflows also propose not buffering part of the tile at the cost of recomputation of the unbuffered data. The design space of fused-layer dataflow accelerators is large, but prior work only considers a subset of the design space. Prior works are limited in a number of ways: (1) tiling only in certain dimensions, leaving some designs unexplored; (2) limited choices of reuse/recompute which are applied uniformly to all layers, leading to increased recomputation; (3) not exploring the interaction of tiling and reuse/recompute choices; and (4) applying the same design choices for all layers in the DNN despite diverse layer shapes, which call for different choices. To address these limitations, we propose (1) a more extensive design space, (2) a taxonomy that introduces structure into the design space, and (3) a fast, flexible, analytical model, called LoopTree, to evaluate the latency, energy consumption, buffer space requirements, and bandwidth requirements of designs in this design space. Finally, we present case studies enabled by LoopTree that show how exploring this larger space results in designs that require less buffer space (e.g., up to 7.6× buffer space reduction for the same off-chip transfers).
Date issued
2023-09
URI
https://hdl.handle.net/1721.1/152743
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.