Anticipating Visual Representations from Unlabeled Video

Vondrick, Carl; Pirsiavash, Hamed; Torralba, Antonio

dc.contributor.author	Vondrick, Carl
dc.contributor.author	Pirsiavash, Hamed
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2018-02-26T21:41:57Z
dc.date.available	2018-02-26T21:41:57Z
dc.date.issued	2016-12
dc.date.submitted	2016-06
dc.identifier.isbn	978-1-4673-8851-1
dc.identifier.uri	http://hdl.handle.net/1721.1/113893
dc.description.abstract	Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently learning this knowledge is through readily available unlabeled video. We present a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects. The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future. Visual representations are a promising prediction target because they encode images at a higher semantic level than pixels yet are automatic to compute. We then apply recognition algorithms on our predicted representation to anticipate objects and actions. We experimentally validate this idea on two datasets, anticipating actions one second in the future and objects five seconds in the future.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Grant IIS-1524817)	en_US
dc.description.sponsorship	Google (Firm) (Faculty Research Award)	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/CVPR.2016.18	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Other repository	en_US
dc.title	Anticipating Visual Representations from Unlabeled Video	en_US
dc.type	Article	en_US
dc.identifier.citation	Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating Visual Representations from Unlabeled Video.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, Las Vegas, Nevada, IEEE, 2016. pp. 98-106	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.mitauthor	Torralba, Antonio
dc.relation.journal	2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Vondrick, Carl; Pirsiavash, Hamed; Torralba, Antonio	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0003-4915-0256
mit.license	OPEN_ACCESS_POLICY	en_US

Files in this item

Name:: Torralba_Anticipating visual.pdf
Size:: 3.554Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record