Structure and geometry in sequence-processing neural networks

Del Río Fernández, Miguel Ángel.

dc.contributor.advisor	SueYeon Chung.	en_US
dc.contributor.author	Del Río Fernández, Miguel Ángel.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2021-02-19T20:36:32Z
dc.date.available	2021-02-19T20:36:32Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/129881
dc.description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020	en_US
dc.description	Cataloged from student-submitted PDF of thesis. Pages 166 and 167 are blank.	en_US
dc.description	Includes bibliographical references (pages 161-165).	en_US
dc.description.abstract	Recent success of state-of-the-art neural models on various natural language processing (NLP) tasks has spurred interest in understanding their representation space. In the following chapters we will use various techniques of representational analysis to understand the nature of neural-network based language modelling. To introduce the concept of linguistic probing, we explore how various language features affect model representations and long-term behavior through the use of linear probing techniques. To tease out the geometrical properties of BERT's internal representations, we task the model with 5 linguistic abstractions (word, part-of-speech, combinatory categorical grammar, dependency parse tree depth, and semantic tag). By using a Mean Field theory backed manifold capacity (MFT) metric, we show that BERT entangles linguistic information when contextualizing a normal sentence but detangles the same information when it must form a token prediction. To mend our findings to those of previous works that used linear probing, we reproduce the prior results and show that linear separation between classes follows the trends we present. To show that linguistic structure of a sentence is being geometrically embedded in BERT representations, we swap words in sentences such that the underlying tree structure becomes perturbed. By using canonical correlation analysis (CCA) to compare sentence representations, we find that the distance between swapped words is directly proportional to the decrease in geometric similarity of model representations.	en_US
dc.description.statementofresponsibility	Miguel Ángel Del Río Fernández.	en_US
dc.format.extent	167 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Structure and geometry in sequence-processing neural networks	en_US
dc.type	Thesis	en_US
dc.description.degree	M. Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1237411406	en_US
dc.description.collection	M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2021-02-19T20:36:02Z	en_US
mit.thesis.degree	Master	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1237411406-MIT.pdf
Size:: 4.828Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record