MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Flexible Energy-Aware Image and Transformer Processors for Edge Computing

Author(s)
Ji, Alex
Thumbnail
DownloadThesis PDF (17.38Mb)
Advisor
Chandrakasan, Anantha P.
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. This work applies different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. These techniques are applied to three neural network accelerators, which have been taped out and tested, to enable efficient inference for a variety of applications. The first chip is a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified computation requires additional storage elements compared to a conventional approach. To minimize the area overhead, a custom accumulator array layout is designed. The second chip targets moderately-sized Transformer models (e.g. ALBERT) using piecewise-linear quantization (PWLQ) for both weights and activations. Lastly, an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models is presented. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels, without the memory overhead of storing multiple models.
Date issued
2023-09
URI
https://hdl.handle.net/1721.1/152854
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.