MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Understanding Concept Representations and their Transformations in Transformer Models

Author(s)
Kearney, Matthew
Thumbnail
DownloadThesis PDF (13.89Mb)
Advisor
Andreas, Jacob
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
As transformer language models continue to be more widely used in a variety of applications, developing methods to understand their internal reasoning processes becomes more critical. One category of such methods called neuron labeling identifies salient directions in the model’s internal representation space and asks what features of the input these directions represent and how they evolve. While research using these methods has focused on finding and automating the label process, a prerequisite to this is first identifying which directions are the salient ones in the model’s computation. There exists theoretical arguments that the activations of the first layer of the multi-layer perceptrons (MLPs) in transformers are the salient basis for represent the information the model is using for computation. However, there currently do not exist any empirical studies comparing these internal representations to others that have been used in prior work. This research answers this question by comparing several directions in the internal representation space of transformers in terms of how well they represent basic linguistic concepts we expect the model to be using in computation. We find that the empirical evidence does support the theoretical arguments and that the first layer of the MLP modules is the most representative basis for these concepts. We further extend this exploration by examining the connections between MLP neurons and developing a method of determining which neurons have the potential of communicating information between one another. In the process we discover specialized neurons for erasing and preserving information in the model’s hidden state and characterize this phenomenon.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151276
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.