MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset

Author(s)
Northcutt, Curtis George; Zha, Shengxin; Lovegrove, Steven; Newcombe, Richard
Thumbnail
Download09200754.pdf (4.457Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but our embodied human intelligence evolved with sensory input from the egocentric, first-person perspective. Towards embodied AI, we introduce the Egocentric Communications (EgoCom) dataset to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning. EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. EgoCom includes 38.5 hours of synchronized embodied stereo audio, egocentric video with 240,000 ground-truth, time-stamped word-level transcriptions and speaker labels from 34 diverse speakers. We study baseline performance on two novel applications that benefit from embodied data: (1) predicting turn-taking in conversations and (2) multi-speaker transcription. For (1), we investigate Bayesian baselines to predict turn-taking within 5% of human performance. For (2), we use simultaneous egocentric capture to combine Google speech-to-text outputs, improving global transcription by 79% relative to a single perspective. Both applications exploit EgoCom's synchronous multi-perspective data to augment performance of embodied AI tasks.
Date issued
2020-09
URI
https://hdl.handle.net/1721.1/130907
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
IEEE Transactions on Pattern Analysis and Machine Intelligence
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Northcutt, Curtis G. et al. "EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset." Forthcoming in IEEE Transactions on Pattern Analysis and Machine Intelligence (September 2020): dx.doi.org/10.1109/tpami.2020.3025105.
Version: Final published version
ISSN
0162-8828
2160-9292
1939-3539

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.