Show simple item record

dc.contributor.authorZhu, Yukun
dc.contributor.authorKiros, Ryan
dc.contributor.authorZemel, Rich
dc.contributor.authorSalakhutdinov, Ruslan
dc.contributor.authorUrtasun, Raquel
dc.contributor.authorTorralba, Antonio
dc.contributor.authorFidler, Sanja
dc.date.accessioned2017-12-29T20:28:58Z
dc.date.available2017-12-29T20:28:58Z
dc.date.issued2016-02
dc.date.submitted2015-12
dc.identifier.isbn978-1-4673-8391-2
dc.identifier.urihttp://hdl.handle.net/1721.1/112996
dc.description.abstractBooks are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.en_US
dc.description.sponsorshipNatural Sciences and Engineering Research Council of Canadaen_US
dc.description.sponsorshipCanadian Institute for Advanced Researchen_US
dc.description.sponsorshipSamsung (Firm)en_US
dc.description.sponsorshipGoogle (Firm)en_US
dc.description.sponsorshipUnited States. Office of Naval Research (ONR-N00014-14-1-0232)en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/ICCV.2015.11en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleAligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Booksen_US
dc.typeArticleen_US
dc.identifier.citationZhu, Yukun, et al. "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books." 2015 IEEE International Conference on Computer Vision (ICCV), 7-13 December, 2015, Santiago, Chile, IEEE, 2015, pp. 19–27.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorTorralba, Antonio
dc.relation.journal2015 IEEE International Conference on Computer Vision (ICCV)en_US
dc.eprint.versionOriginal manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsZhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanjaen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-4915-0256
mit.licenseOPEN_ACCESS_POLICYen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record