Show simple item record

dc.contributor.advisorLynch, Jayson
dc.contributor.authorLi, Daniel D.
dc.date.accessioned2025-09-18T14:29:51Z
dc.date.available2025-09-18T14:29:51Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:02:49.153Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162737
dc.description.abstractEfficient inference is a growing priority in deep learning, where large model sizes and increasing deployment demands pose challenges for latency, memory, and energy usage. This thesis presents a unified framework for evaluating approximation methods that accelerate inference by modifying weight matrices. We model each method as a function ƒ_c(A) that approximates a weight matrix A under a compression rate c, and assess its impact on both matrix–vector accuracy and downstream task performance. We conduct empirical evaluations across two representative models, AlexNet on CIFAR10 and DistilBERT on AG News, comparing quantization, sparsification, and low-rank approximations. Our analysis spans four perspectives: (1) how different methods trade off ℓ₂ error and compression, (2) how weight statistics and input distributions shape error, (3) how well ℓ₂ error predicts classification accuracy, and (4) how idealized compression differs from real memory savings. We find that sparsification offers a strong trade-off between storage and accuracy, particularly because it preserves task-relevant structure in the weights. We also show that ℓ₂ error is not always a reliable proxy for accuracy, especially when input data lie on low-dimensional manifolds. These results suggest that approximation quality must be evaluated not only by global distortion metrics, but also by how the method interacts with model structure and input distributions. Our findings offer practical guidance for deploying efficient deep learning models and shed light on how compression affects performance in real-world settings.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleEfficient ML Inference via Matrix-Vector Approximations
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record