Self-Training for Natural Language Processing

Luo, Hongyin

dc.contributor.advisor	Glass, James R.
dc.contributor.author	Luo, Hongyin
dc.date.accessioned	2022-08-29T16:09:38Z
dc.date.available	2022-08-29T16:09:38Z
dc.date.issued	2022-05
dc.date.submitted	2022-06-21T19:15:30.622Z
dc.identifier.uri	https://hdl.handle.net/1721.1/144758
dc.description.abstract	Data annotation is critical for machine learning based natural language processing models. Although many large-scale corpora and standard benchmarks have been annotated and published, they cannot cover all possible applications. As a result, it is difficult to transfer models trained with public corpora to tasks that require domain-specific knowledge, different inference skills, unseen text styles, and explainability. In this thesis, we explore self-training methods for mitigating the data distribution gaps between training and evaluation domains and tasks. In contrast to traditional self-training methods that study the best practice of training models with real data and pseudo labels, we also explore the possibility of automatically generating synthetic data for better explainability, robustness, and domain adaptation performance. We show the performance improvement achieved by our methods on different natural language understanding and generation tasks, including question answering, question generation, and dialog response selection.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Self-Training for Natural Language Processing
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: Luo-hyluo-PhD-EECS-2022-thesis.pdf
Size:: 3.220Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record