Advanced Search
DSpace@MIT

Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.author Zhu, Yukun
dc.contributor.author Kiros, Ryan
dc.contributor.author Zemel, Rich
dc.contributor.author Salakhutdinov, Ruslan
dc.contributor.author Urtasun, Raquel
dc.contributor.author Fidler, Sanja
dc.contributor.author Torralba, Antonio
dc.date.accessioned 2017-12-29T20:28:58Z
dc.date.available 2017-12-29T20:28:58Z
dc.date.issued 2016-02
dc.date.submitted 2015-12
dc.identifier.isbn 978-1-4673-8391-2
dc.identifier.uri http://hdl.handle.net/1721.1/112996
dc.description.abstract Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for. en_US
dc.description.sponsorship Natural Sciences and Engineering Research Council of Canada en_US
dc.description.sponsorship Canadian Institute for Advanced Research en_US
dc.description.sponsorship Samsung (Firm) en_US
dc.description.sponsorship Google (Firm) en_US
dc.description.sponsorship United States. Office of Naval Research (ONR-N00014-14-1-0232) en_US
dc.language.iso en_US
dc.publisher Institute of Electrical and Electronics Engineers (IEEE) en_US
dc.relation.isversionof http://dx.doi.org/10.1109/ICCV.2015.11 en_US
dc.rights Creative Commons Attribution-Noncommercial-Share Alike en_US
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/ en_US
dc.source arXiv en_US
dc.title Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books en_US
dc.type Article en_US
dc.identifier.citation Zhu, Yukun, et al. "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books." 2015 IEEE International Conference on Computer Vision (ICCV), 7-13 December, 2015, Santiago, Chile, IEEE, 2015, pp. 19–27. en_US
dc.contributor.department Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science en_US
dc.contributor.mitauthor Torralba, Antonio
dc.relation.journal 2015 IEEE International Conference on Computer Vision (ICCV) en_US
dc.identifier.mitlicense OPEN_ACCESS_POLICY en_US
dc.eprint.version Original manuscript en_US
dc.type.uri http://purl.org/eprint/type/ConferencePaper en_US
eprint.status http://purl.org/eprint/status/NonPeerReviewed en_US
dspace.orderedauthors Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja en_US
dspace.embargo.terms N en_US
dc.identifier.orcid https://orcid.org/0000-0003-4915-0256


Files in this item

Name Size Format
Downloadable Full Text - PDF

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution-Noncommercial-Share Alike Except where otherwise noted, this item's license is described as Creative Commons Attribution-Noncommercial-Share Alike
Open Access