MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sparse Expansion and Neuronal Disentanglement

Author(s)
Kong, Linghao
Thumbnail
DownloadThesis PDF (1.497Mb)
Advisor
Shavit, Nir N.
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the same weights and one-shot pruned for a specific cluster of input values. We call this approach Sparse Expansion. We show that for models like Llama 2 7B, as we increase the number of experts, Sparse Expansion outperforms all other one-shot sparsification approaches for the same FLOPs budget, and this gap grows as sparsity increases. But why? To answer this, we provide strong evidence that the mixture of sparse experts is effectively disentangling the input-output relationship of every individual neuron. Sparse experts approximate a neuron’s dense output distribution with fewer weights by decomposing the distribution into a collection of simpler ones, each with a separate sparse dot product covering it. Interestingly, we show that the Wasserstein distance between a neuron’s output distribution and a Gaussian distribution is an indicator of its entanglement level and contribution to the accuracy of the model. Every layer of an LLM has highly entangled neurons, and model performance suffers more when these are sparsified as opposed to others. We believe that these neurons may have implications beyond sparsity in understanding the performance of LLMs.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156287
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.