Efficient ML Inference via Matrix-Vector Approximations

Li, Daniel D.

Author(s)

Li, Daniel D.

DownloadThesis PDF (1.318Mb)

Advisor

Lynch, Jayson

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Efficient inference is a growing priority in deep learning, where large model sizes and increasing deployment demands pose challenges for latency, memory, and energy usage. This thesis presents a unified framework for evaluating approximation methods that accelerate inference by modifying weight matrices. We model each method as a function ƒ_c(A) that approximates a weight matrix A under a compression rate c, and assess its impact on both matrix–vector accuracy and downstream task performance. We conduct empirical evaluations across two representative models, AlexNet on CIFAR10 and DistilBERT on AG News, comparing quantization, sparsification, and low-rank approximations. Our analysis spans four perspectives: (1) how different methods trade off ℓ₂ error and compression, (2) how weight statistics and input distributions shape error, (3) how well ℓ₂ error predicts classification accuracy, and (4) how idealized compression differs from real memory savings. We find that sparsification offers a strong trade-off between storage and accuracy, particularly because it preserves task-relevant structure in the weights. We also show that ℓ₂ error is not always a reliable proxy for accuracy, especially when input data lie on low-dimensional manifolds. These results suggest that approximation quality must be evaluated not only by global distortion metrics, but also by how the method interacts with model structure and input distributions. Our findings offer practical guidance for deploying efficient deep learning models and shed light on how compression affects performance in real-world settings.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162737

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses