Show simple item record

dc.contributor.advisorSueYeon Chung.en_US
dc.contributor.authorDel Río Fernández, Miguel Ángel.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2021-02-19T20:36:32Z
dc.date.available2021-02-19T20:36:32Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/129881
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020en_US
dc.descriptionCataloged from student-submitted PDF of thesis. Pages 166 and 167 are blank.en_US
dc.descriptionIncludes bibliographical references (pages 161-165).en_US
dc.description.abstractRecent success of state-of-the-art neural models on various natural language processing (NLP) tasks has spurred interest in understanding their representation space. In the following chapters we will use various techniques of representational analysis to understand the nature of neural-network based language modelling. To introduce the concept of linguistic probing, we explore how various language features affect model representations and long-term behavior through the use of linear probing techniques. To tease out the geometrical properties of BERT's internal representations, we task the model with 5 linguistic abstractions (word, part-of-speech, combinatory categorical grammar, dependency parse tree depth, and semantic tag). By using a Mean Field theory backed manifold capacity (MFT) metric, we show that BERT entangles linguistic information when contextualizing a normal sentence but detangles the same information when it must form a token prediction. To mend our findings to those of previous works that used linear probing, we reproduce the prior results and show that linear separation between classes follows the trends we present. To show that linguistic structure of a sentence is being geometrically embedded in BERT representations, we swap words in sentences such that the underlying tree structure becomes perturbed. By using canonical correlation analysis (CCA) to compare sentence representations, we find that the distance between swapped words is directly proportional to the decrease in geometric similarity of model representations.en_US
dc.description.statementofresponsibilityMiguel Ángel Del Río Fernández.en_US
dc.format.extent167 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleStructure and geometry in sequence-processing neural networksen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1237411406en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2021-02-19T20:36:02Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record