Transfer learning and robustness for natural language processing
Author(s)Jin, Di,Ph.D.Massachusetts Institute of Technology.
Massachusetts Institute of Technology. Department of Mechanical Engineering.
MetadataShow full item record
Teaching machines to understand human language is one of the most elusive and long-standing challenges in Natural Language Processing (NLP). Driven by the fast development of deep learning, state-of-the-art NLP models have already achieved human-level performance in various large benchmark datasets, such as SQuAD, SNLI, and RACE. However, when these strong models are deployed to real-world applications, they often show poor generalization capability in two situations: 1. There is only a limited amount of data available for model training; 2. Deployed models may degrade significantly in performance on noisy test data or natural/artificial adversaries. In short, performance degradation on low-resource tasks/datasets and unseen data with distribution shifts imposes great challenges to the reliability of NLP models and prevent them from being massively applied in the wild. This dissertation aims to address these two issues.Towards the first one, we resort to transfer learning to leverage knowledge acquired from related data in order to improve performance on a target low-resource task/dataset. Specifically, we propose different transfer learning methods for three natural language understanding tasks: multi-choice question answering, dialogue state tracking, and sequence labeling, and one natural language generation task: machine translation. These methods are based on four basic transfer learning modalities: multi-task learning, sequential transfer learning, domain adaptation, and cross-lingual transfer. We show experimental results to validate that transferring knowledge from related domains, tasks, and languages can improve the target task/dataset significantly. For the second issue, we propose methods to evaluate the robustness of NLP models on text classification and entailment tasks.On one hand, we reveal that although these models can achieve a high accuracy of over 90%, they still easily crash over paraphrases of original samples by changing only around 10% words to their synonyms. On the other hand, by creating a new challenge set using four adversarial strategies, we find even the best models for the aspect-based sentiment analysis task cannot reliably identify the target aspect and recognize its sentiment accordingly. On the contrary, they are easily confused by distractor aspects. Overall, these findings raise great concerns of robustness of NLP models, which should be enhanced to ensure their long-run stable service.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2020Cataloged from student-submitted PDF of thesis.Includes bibliographical references (pages 189-217).
DepartmentMassachusetts Institute of Technology. Department of Mechanical Engineering
Massachusetts Institute of Technology