Efficient ML Inference via Matrix-Vector Approximations

Li, Daniel D.

dc.contributor.advisor	Lynch, Jayson
dc.contributor.author	Li, Daniel D.
dc.date.accessioned	2025-09-18T14:29:51Z
dc.date.available	2025-09-18T14:29:51Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:02:49.153Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162737
dc.description.abstract	Efficient inference is a growing priority in deep learning, where large model sizes and increasing deployment demands pose challenges for latency, memory, and energy usage. This thesis presents a unified framework for evaluating approximation methods that accelerate inference by modifying weight matrices. We model each method as a function ƒ_c(A) that approximates a weight matrix A under a compression rate c, and assess its impact on both matrix–vector accuracy and downstream task performance. We conduct empirical evaluations across two representative models, AlexNet on CIFAR10 and DistilBERT on AG News, comparing quantization, sparsification, and low-rank approximations. Our analysis spans four perspectives: (1) how different methods trade off ℓ₂ error and compression, (2) how weight statistics and input distributions shape error, (3) how well ℓ₂ error predicts classification accuracy, and (4) how idealized compression differs from real memory savings. We find that sparsification offers a strong trade-off between storage and accuracy, particularly because it preserves task-relevant structure in the weights. We also show that ℓ₂ error is not always a reliable proxy for accuracy, especially when input data lie on low-dimensional manifolds. These results suggest that approximation quality must be evaluated not only by global distortion metrics, but also by how the method interacts with model structure and input distributions. Our findings offer practical guidance for deploying efficient deep learning models and shed light on how compression affects performance in real-world settings.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Efficient ML Inference via Matrix-Vector Approximations
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: li-ddl-meng-eecs-2025-thesis.pdf
Size:: 1.318Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record