Show simple item record

dc.contributor.authorYao, Xiaozhe
dc.contributor.authorHu, Qinghao
dc.contributor.authorKlimovic, Ana
dc.date.accessioned2025-05-09T16:51:40Z
dc.date.available2025-05-09T16:51:40Z
dc.date.issued2025-03-30
dc.identifier.isbn979-8-4007-1196-1
dc.identifier.urihttps://hdl.handle.net/1721.1/159252
dc.descriptionEuroSys ’25, March 30–April 3, 2025, Rotterdam, Netherlandsen_US
dc.description.abstractFine-tuning large language models (LLMs) greatly improves model quality for downstream tasks. However, serving many fine-tuned LLMs concurrently is challenging due to the sporadic, bursty, and varying request patterns of different LLMs. To bridge this gap, we present DeltaZip, an LLM serving system that efficiently serves multiple full-parameter fine-tuned models concurrently by aggressively compressing model deltas by up to 10× while maintaining high model quality. The key insight behind this design is that fine-tuning results in small-magnitude changes to the pre-trained model. By co-designing the serving system with the compression algorithm, DeltaZip achieves 2× to 12× improvement in throughput compared to the state-of-the-art systems.en_US
dc.publisherACM|Twentieth European Conference on Computer Systemsen_US
dc.relation.isversionofhttps://doi.org/10.1145/3689031.3717468en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleDeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMsen_US
dc.typeArticleen_US
dc.identifier.citationXiaozhe Yao, Qinghao Hu, and Ana Klimovic. 2025. DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs. In Proceedings of the Twentieth European Conference on Computer Systems (EuroSys '25). Association for Computing Machinery, New York, NY, USA, 110–127.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Research Laboratory of Electronicsen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-04-01T07:49:37Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-04-01T07:49:37Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record