Show simple item record

dc.contributor.authorNorthcutt, Curtis George
dc.contributor.authorZha, Shengxin
dc.contributor.authorLovegrove, Steven
dc.contributor.authorNewcombe, Richard
dc.date.accessioned2021-06-07T14:17:15Z
dc.date.available2021-06-07T14:17:15Z
dc.date.issued2020-09
dc.identifier.issn0162-8828
dc.identifier.issn2160-9292
dc.identifier.issn1939-3539
dc.identifier.urihttps://hdl.handle.net/1721.1/130907
dc.description.abstractMulti-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but our embodied human intelligence evolved with sensory input from the egocentric, first-person perspective. Towards embodied AI, we introduce the Egocentric Communications (EgoCom) dataset to advance the state-of-the-art in conversational AI, natural language, audio speech analysis, computer vision, and machine learning. EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. EgoCom includes 38.5 hours of synchronized embodied stereo audio, egocentric video with 240,000 ground-truth, time-stamped word-level transcriptions and speaker labels from 34 diverse speakers. We study baseline performance on two novel applications that benefit from embodied data: (1) predicting turn-taking in conversations and (2) multi-speaker transcription. For (1), we investigate Bayesian baselines to predict turn-taking within 5% of human performance. For (2), we use simultaneous egocentric capture to combine Google speech-to-text outputs, improving global transcription by 79% relative to a single perspective. Both applications exploit EgoCom's synchronous multi-perspective data to augment performance of embodied AI tasks.en_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/tpami.2020.3025105en_US
dc.rightsCreative Commons Attribution 4.0 International licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceCurtis Northcutten_US
dc.titleEgoCom: A Multi-person Multi-modal Egocentric Communications Dataseten_US
dc.typeArticleen_US
dc.identifier.citationNorthcutt, Curtis G. et al. "EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset." Forthcoming in IEEE Transactions on Pattern Analysis and Machine Intelligence (September 2020): dx.doi.org/10.1109/tpami.2020.3025105.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.approverNorthcutt, Curtis Georgeen_US
dc.relation.journalIEEE Transactions on Pattern Analysis and Machine Intelligenceen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.date.submission2021-06-05T19:58:03Z
mit.licensePUBLISHER_CC
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record