Unsupervised Text Translation Through the Application of Generative Adversarial Networks

Wang, Xiaoyi

Author(s)

Wang, Xiaoyi

DownloadThesis PDF (786.9Kb)

Advisor

Wornell, Gregory

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Text translation is a very broad subfield of natural language processing that tries to generate output text with different characteristics conditioned on some input text. More specifically, we seek to find a translation of the input that retains the semantic contents of the original while changing the style. This encompasses many tasks including sentiment transfer, text summarization, and language translation. One defining characteristic of these problems is the lack of access to paired training data, which inhibits training via a straightforward maximum likelihood estimation approach. This requires us to focus on unsupervised techniques for text translation that depend only on access to large domains of unpaired data. For unsupervised translation, one approach involves the use of generative adversarial techniques for sequence generation. Unfortunately, prior work using these techniques suffer from poor alignment and training instability. This thesis proposes two alternative models for unsupervised text translation that attempt to alleviate these issues through the incorporation of additional information and the introduction of a different training regime. We demonstrate several translation applications that benefit from these approaches and evaluate performance using framework that is independent of a ground truth paired dataset. Through the experiments, we find improvements over the baseline, particular in the accuracy of the style transfer. We also demonstrate the efficacy of text translation as a data augmentation technique to generate new labeled data with different styles. This mechanism yields significant improvements in classifier robustness. Lastly, we evaluate performance under a semi-supervised training regime and compare against popular baselines. The results reveal significant alignment improvements from the incorporation of an extremely low amount of paired data which is one order-of-magnitude smaller than that of prior studies.

Date issued

2021-06

URI

https://hdl.handle.net/1721.1/139280

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses