Show simple item record

dc.contributor.advisorBoris Katz.en_US
dc.contributor.authorFu, Allison.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2020-09-15T21:55:48Z
dc.date.available2020-09-15T21:55:48Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/127397
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020en_US
dc.descriptionCataloged from the official PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 55-58).en_US
dc.description.abstractMachine translation or the automatic translation by computers from a source language to target language is a well-studied, difficult research problem. In recent years, there has been increased interest in grounding translation in vision. We introduce an unsupervised machine translation system grounded in video that can perform Chinese-English translation without the need for a parallel text corpus. In particular, we train separate Chinese and English generative language-vision models on only 267 captioned videos. We then perform translation by sampling video features for an input sentence in Chinese and finding the top-scoring English sentence or translation that describes the sampled video frames. We found that such a system picks out the correct translation with high accuracy and is a promising step towards augmenting language understanding with video.en_US
dc.description.statementofresponsibilityby Allison Fu.en_US
dc.format.extent58 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleA language-vision model for translationen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1192544690en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2020-09-15T21:55:48Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record