HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

Zhao, Hang; Torralba, Antonio; Torresani, Lorenzo; Yan, Zhicheng

dc.contributor.author	Zhao, Hang
dc.contributor.author	Torralba, Antonio
dc.contributor.author	Torresani, Lorenzo
dc.contributor.author	Yan, Zhicheng
dc.date.accessioned	2021-11-05T19:39:46Z
dc.date.available	2021-11-05T19:39:46Z
dc.date.issued	2019
dc.identifier.uri	https://hdl.handle.net/1721.1/137602
dc.description.abstract	© 2019 IEEE. This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Segments contains 139K action segments densely annotated in 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large-scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transfer learning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1M as a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations.	en_US
dc.language.iso	en
dc.publisher	IEEE	en_US
dc.relation.isversionof	10.1109/ICCV.2019.00876	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhao, Hang, Torralba, Antonio, Torresani, Lorenzo and Yan, Zhicheng. 2019. "HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization." Proceedings of the IEEE International Conference on Computer Vision, 2019-October.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Proceedings of the IEEE International Conference on Computer Vision	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-01-28T13:00:59Z
dspace.orderedauthors	Zhao, H; Torralba, A; Torresani, L; Yan, Z	en_US
dspace.date.submission	2021-01-28T13:01:03Z
mit.journal.volume	2019-October	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 1712.09374.pdf
Size:: 4.409Mb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record