MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient Deployment Algorithms for Large Language Models

Author(s)
Xiao, Guangxuan
Thumbnail
DownloadThesis PDF (16.19Mb)
Advisor
Han, Song
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Large language models (LLMs) have achieved impressive performance on various natural language tasks. However, their massive computational and memory requirements hinder widespread deployment. Additionally, deploying them on extensive inputs presents efficiency and accuracy challenges. This proposal introduces two techniques to enable efficient and accurate quantization and streaming deployment of LLMs, facilitating their application in real-world systems with limited resources. First, we develop SmoothQuant, an accurate post-training 8-bit quantization method of both weights and activations in LLMs up to 530B parameters. By smoothing outliers in activations, SmoothQuant enables the use of efficient INT8 kernels on all matrix multiplications with negligible accuracy loss. Second, we present StreamingLLM, enabling LLMs to handle arbitrarily long text sequences using a fixed memory budget. It exploits ``attention sinks'' in LLMs to stably anchor attention computation on lengthy contexts. Experiments show StreamingLLM can model over 4 million tokens with up to 22x speedup compared to recomputation baselines. Together, these two techniques can significantly reduce the computational and memory costs of large language models, increasing their accessibility for practical usage.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156332
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.