Sparse Expansion and Neuronal Disentanglement

Kong, Linghao

Author(s)

Kong, Linghao

DownloadThesis PDF (1.497Mb)

Advisor

Shavit, Nir N.

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the same weights and one-shot pruned for a specific cluster of input values. We call this approach Sparse Expansion. We show that for models like Llama 2 7B, as we increase the number of experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same FLOPs budget, and this gap grows as sparsity increases. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively disentangling the input-output relationship of every individual neuron. Sparse experts approximate a neuron’s dense output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron’s output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has highly entangled neurons, and model performance suffers more when these are sparsified as opposed to others. We believe that these neurons may have implications beyond sparsity in understanding the performance of LLMs.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156287

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses