Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja

Author(s)

Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; ... Show more

DownloadTorralba_Aligning books.pdf (8.196Mb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.

Date issued

2016-02

URI

http://hdl.handle.net/1721.1/112996

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

2015 IEEE International Conference on Computer Vision (ICCV)

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Citation

Zhu, Yukun, et al. "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books." 2015 IEEE International Conference on Computer Vision (ICCV), 7-13 December, 2015, Santiago, Chile, IEEE, 2015, pp. 19–27.

Version: Original manuscript

ISBN

978-1-4673-8391-2

Collections

MIT Open Access Articles

DSpace@MIT