Enriching models of natural language with auxiliary data

Malmaud, Jonathan Matthew.

dc.contributor.author	Malmaud, Jonathan Matthew.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences.	en_US
dc.date.accessioned	2021-12-17T17:04:29Z
dc.date.available	2021-12-17T17:04:29Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/138515
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, February, 2020	en_US
dc.description	Manuscript.	en_US
dc.description	Includes bibliographical references (pages 81-89).	en_US
dc.description.abstract	The highest-performing natural language processing models generally solve language tasks by deriving statistical regularities of sequences of arbitrary tokens supplied as training data. Humans have a much richer notion of language, however. For one thing, they understand that language refers to objects aid actions in the real world, which enables them to use language to efficiently transmit instructions on how to accomplish goals. For another, they learn to focus their attention on only those spans of text important for accomplishing the task at hand. ăIn this thesis, we attempt to improve machine models of language by taking inspiration from these aspects of human language. The first half of this thesis concerns understanding instructional "how-to" language, such as "Add remaining flour. Then mix." The meaning is ambiguous without context: Add how much flour to what? Mix what, using what tools, until when? We show how to successfully parse this language by maintaining a distribution over the state of a theoretical kitchen as the instructions are parsed. We also show how to aid interpretation if videos of the task are also available by training a joint vision-language model with over 300,000 Youtube videos on how to cook. The second half discusses taking advantage of people's ability to focus on important parts of a passage in a multiple-choice reading comprehension task to enhance the performance of an automatic question-answering system. We record the gaze location of hundreds of subjects as they read and answer questions about newspaper articles. We then train a state-of-the-art transformer model to predict human attention as well correct answers and find this leads to a substantial boost in performance over merely training the model to predicting correct answers.	en_US
dc.description.statementofresponsibility	by Jonathan Matthew Malmaud.	en_US
dc.format.extent	89 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Brain and Cognitive Sciences.	en_US
dc.title	Enriching models of natural language with auxiliary data	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences	en_US
dc.identifier.oclc	1280702700	en_US
dc.description.collection	Ph. D. Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences	en_US
dspace.imported	2021-12-17T17:04:29Z	en_US
mit.thesis.degree	Doctoral	en_US
mit.thesis.department	Brain	en_US

Files in this item

Name:: 1280702700-MIT.pdf
Size:: 7.222Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record