Show simple item record

dc.contributor.authorOh, Taehyun
dc.contributor.authorDekel, Tali
dc.contributor.authorKim, Changil
dc.contributor.authorMosseri, Inbar
dc.contributor.authorFreeman, William T
dc.contributor.authorRubinstein, Michael
dc.contributor.authorMatusik, Wojciech
dc.date.accessioned2021-03-30T15:11:16Z
dc.date.available2021-03-30T15:11:16Z
dc.date.issued2020-01
dc.date.submitted2019-06
dc.identifier.isbn9781728132938
dc.identifier.issn2575-7075
dc.identifier.urihttps://hdl.handle.net/1721.1/130277
dc.description.abstractHow much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/Youtube videos of people speaking. During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. We evaluate and numerically quantify how-and in what manner-our Speech2Face reconstructions, obtained directly from audio, resemble the true face images of the speakers.en_US
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/cvpr.2019.00772en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleSpeech2Face: Learning the Face Behind a Voiceen_US
dc.typeArticleen_US
dc.identifier.citationOh, Tae-Hyun et al. "Speech2Face: Learning the Face Behind a Voice." 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, Long Beach, California, Institute of Electrical and Electronics Engineers, January 2020. © 2019 IEEEen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journal2019 IEEE/CVF Conference on Computer Vision and Pattern Recognitionen_US
dc.eprint.versionOriginal manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-02-05T18:37:36Z
dspace.orderedauthorsOh, T-H; Dekel, T; Kim, C; Mosseri, I; Freeman, WT; Rubinstein, M; Matusik, Wen_US
dspace.date.submission2021-02-05T18:37:42Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record