dc.contributor.advisor | Boris Katz. | en_US |
dc.contributor.author | Fu, Allison. | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2020-09-15T21:55:48Z | |
dc.date.available | 2020-09-15T21:55:48Z | |
dc.date.copyright | 2020 | en_US |
dc.date.issued | 2020 | en_US |
dc.identifier.uri | https://hdl.handle.net/1721.1/127397 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020 | en_US |
dc.description | Cataloged from the official PDF of thesis. | en_US |
dc.description | Includes bibliographical references (pages 55-58). | en_US |
dc.description.abstract | Machine translation or the automatic translation by computers from a source language to target language is a well-studied, difficult research problem. In recent years, there has been increased interest in grounding translation in vision. We introduce an unsupervised machine translation system grounded in video that can perform Chinese-English translation without the need for a parallel text corpus. In particular, we train separate Chinese and English generative language-vision models on only 267 captioned videos. We then perform translation by sampling video features for an input sentence in Chinese and finding the top-scoring English sentence or translation that describes the sampled video frames. We found that such a system picks out the correct translation with high accuracy and is a promising step towards augmenting language understanding with video. | en_US |
dc.description.statementofresponsibility | by Allison Fu. | en_US |
dc.format.extent | 58 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | A language-vision model for translation | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.identifier.oclc | 1192544690 | en_US |
dc.description.collection | M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science | en_US |
dspace.imported | 2020-09-15T21:55:48Z | en_US |
mit.thesis.degree | Master | en_US |
mit.thesis.department | EECS | en_US |