A language-vision model for translation
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Machine translation or the automatic translation by computers from a source language to target language is a well-studied, difficult research problem. In recent years, there has been increased interest in grounding translation in vision. We introduce an unsupervised machine translation system grounded in video that can perform Chinese-English translation without the need for a parallel text corpus. In particular, we train separate Chinese and English generative language-vision models on only 267 captioned videos. We then perform translation by sampling video features for an input sentence in Chinese and finding the top-scoring English sentence or translation that describes the sampled video frames. We found that such a system picks out the correct translation with high accuracy and is a promising step towards augmenting language understanding with video.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 55-58).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.