A language-vision model for translation

Fu, Allison.

dc.contributor.advisor	Boris Katz.	en_US
dc.contributor.author	Fu, Allison.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2020-09-15T21:55:48Z
dc.date.available	2020-09-15T21:55:48Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/127397
dc.description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020	en_US
dc.description	Cataloged from the official PDF of thesis.	en_US
dc.description	Includes bibliographical references (pages 55-58).	en_US
dc.description.abstract	Machine translation or the automatic translation by computers from a source language to target language is a well-studied, difficult research problem. In recent years, there has been increased interest in grounding translation in vision. We introduce an unsupervised machine translation system grounded in video that can perform Chinese-English translation without the need for a parallel text corpus. In particular, we train separate Chinese and English generative language-vision models on only 267 captioned videos. We then perform translation by sampling video features for an input sentence in Chinese and finding the top-scoring English sentence or translation that describes the sampled video frames. We found that such a system picks out the correct translation with high accuracy and is a promising step towards augmenting language understanding with video.	en_US
dc.description.statementofresponsibility	by Allison Fu.	en_US
dc.format.extent	58 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	A language-vision model for translation	en_US
dc.type	Thesis	en_US
dc.description.degree	M. Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1192544690	en_US
dc.description.collection	M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2020-09-15T21:55:48Z	en_US
mit.thesis.degree	Master	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1192544690-MIT.pdf
Size:: 2.687Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record