MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Structure and geometry in sequence-processing neural networks

Author(s)
Del Río Fernández, Miguel Ángel.
Thumbnail
Download1237411406-MIT.pdf (4.828Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
SueYeon Chung.
Terms of use
MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Recent success of state-of-the-art neural models on various natural language processing (NLP) tasks has spurred interest in understanding their representation space. In the following chapters we will use various techniques of representational analysis to understand the nature of neural-network based language modelling. To introduce the concept of linguistic probing, we explore how various language features affect model representations and long-term behavior through the use of linear probing techniques. To tease out the geometrical properties of BERT's internal representations, we task the model with 5 linguistic abstractions (word, part-of-speech, combinatory categorical grammar, dependency parse tree depth, and semantic tag). By using a Mean Field theory backed manifold capacity (MFT) metric, we show that BERT entangles linguistic information when contextualizing a normal sentence but detangles the same information when it must form a token prediction. To mend our findings to those of previous works that used linear probing, we reproduce the prior results and show that linear separation between classes follows the trends we present. To show that linguistic structure of a sentence is being geometrically embedded in BERT representations, we swap words in sentences such that the underlying tree structure becomes perturbed. By using canonical correlation analysis (CCA) to compare sentence representations, we find that the distance between swapped words is directly proportional to the decrease in geometric similarity of model representations.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020
 
Cataloged from student-submitted PDF of thesis. Pages 166 and 167 are blank.
 
Includes bibliographical references (pages 161-165).
 
Date issued
2020
URI
https://hdl.handle.net/1721.1/129881
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.