Show simple item record

dc.contributor.advisorHan, Song
dc.contributor.authorXiao, Guangxuan
dc.date.accessioned2024-08-21T18:57:28Z
dc.date.available2024-08-21T18:57:28Z
dc.date.issued2024-05
dc.date.submitted2024-07-10T13:00:03.136Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156332
dc.description.abstractLarge language models (LLMs) have achieved impressive performance on various natural language tasks. However, their massive computational and memory requirements hinder widespread deployment. Additionally, deploying them on extensive inputs presents efficiency and accuracy challenges. This proposal introduces two techniques to enable efficient and accurate quantization and streaming deployment of LLMs, facilitating their application in real-world systems with limited resources. First, we develop SmoothQuant, an accurate post-training 8-bit quantization method of both weights and activations in LLMs up to 530B parameters. By smoothing outliers in activations, SmoothQuant enables the use of efficient INT8 kernels on all matrix multiplications with negligible accuracy loss. Second, we present StreamingLLM, enabling LLMs to handle arbitrarily long text sequences using a fixed memory budget. It exploits ``attention sinks'' in LLMs to stably anchor attention computation on lengthy contexts. Experiments show StreamingLLM can model over 4 million tokens with up to 22x speedup compared to recomputation baselines. Together, these two techniques can significantly reduce the computational and memory costs of large language models, increasing their accessibility for practical usage.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleEfficient Deployment Algorithms for Large Language Models
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcidhttps://orcid.org/0000-0002-7182-9284
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record