Show simple item record

dc.contributor.authorVondrick, Carl
dc.contributor.authorPirsiavash, Hamed
dc.contributor.authorTorralba, Antonio
dc.date.accessioned2018-02-26T21:41:57Z
dc.date.available2018-02-26T21:41:57Z
dc.date.issued2016-12
dc.date.submitted2016-06
dc.identifier.isbn978-1-4673-8851-1
dc.identifier.urihttp://hdl.handle.net/1721.1/113893
dc.description.abstractAnticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently learning this knowledge is through readily available unlabeled video. We present a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects. The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future. Visual representations are a promising prediction target because they encode images at a higher semantic level than pixels yet are automatic to compute. We then apply recognition algorithms on our predicted representation to anticipate objects and actions. We experimentally validate this idea on two datasets, anticipating actions one second in the future and objects five seconds in the future.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant IIS-1524817)en_US
dc.description.sponsorshipGoogle (Firm) (Faculty Research Award)en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/CVPR.2016.18en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceOther repositoryen_US
dc.titleAnticipating Visual Representations from Unlabeled Videoen_US
dc.typeArticleen_US
dc.identifier.citationVondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. “Anticipating Visual Representations from Unlabeled Video.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, Las Vegas, Nevada, IEEE, 2016. pp. 98-106en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorTorralba, Antonio
dc.relation.journal2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsVondrick, Carl; Pirsiavash, Hamed; Torralba, Antonioen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-4915-0256
mit.licenseOPEN_ACCESS_POLICYen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record