Show simple item record

dc.contributor.authorZhao, Hang
dc.contributor.authorTorralba, Antonio
dc.contributor.authorTorresani, Lorenzo
dc.contributor.authorYan, Zhicheng
dc.date.accessioned2021-11-05T19:39:46Z
dc.date.available2021-11-05T19:39:46Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/1721.1/137602
dc.description.abstract© 2019 IEEE. This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Segments contains 139K action segments densely annotated in 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large-scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transfer learning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1M as a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations.en_US
dc.language.isoen
dc.publisherIEEEen_US
dc.relation.isversionof10.1109/ICCV.2019.00876en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleHACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localizationen_US
dc.typeArticleen_US
dc.identifier.citationZhao, Hang, Torralba, Antonio, Torresani, Lorenzo and Yan, Zhicheng. 2019. "HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization." Proceedings of the IEEE International Conference on Computer Vision, 2019-October.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalProceedings of the IEEE International Conference on Computer Visionen_US
dc.eprint.versionOriginal manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-28T13:00:59Z
dspace.orderedauthorsZhao, H; Torralba, A; Torresani, L; Yan, Zen_US
dspace.date.submission2021-01-28T13:01:03Z
mit.journal.volume2019-Octoberen_US
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record