Synthetic-to-real transfer for natural language processing
Author(s)
Marzoev, Michelle Alana.
Download1252064308-MIT.pdf (1.366Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Jacob Andreas.
Terms of use
Metadata
Show full item recordAbstract
Large, human-annotated datasets are central to the development of natural language processing models. Collecting these datasets is often the most challenging part of the development process. In this thesis, I explore different strategies for learning models that can interpret natural utterances without natural training data through "simulation-to-real" transfer techniques suited to language understanding problems with a delimited set of target behaviors. Each of the transfer techniques requires access to a manually-specified synthetic data generation procedure (i.e. a "synthetic grammar") as a source of unlimited but linguistically homogeneous training data. This data is used to train models that can accurately interpret utterances from the synthetic grammar. Through experiments, I demonstrate that the most effective method for sim-to-real transfer involves automatically finding projections of natural language utterances onto the support of the synthetic language, using learned sentence embeddings to define a distance metric. With only synthetic training data, the projections approach matches or outperforms state-of-the-art models trained on natural language data on grounded instruction following and semantic parsing problems. These results suggest that simulation-to-real transfer could be a practical framework for developing NLP applications with defined target behaviors in cases where natural in-domain training data is not readily available.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 Cataloged from the official PDF version of thesis. Includes bibliographical references (pages 41-42).
Date issued
2021Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.