dc.contributor.author | Oh, Taehyun | |
dc.contributor.author | Dekel, Tali | |
dc.contributor.author | Kim, Changil | |
dc.contributor.author | Mosseri, Inbar | |
dc.contributor.author | Freeman, William T | |
dc.contributor.author | Rubinstein, Michael | |
dc.contributor.author | Matusik, Wojciech | |
dc.date.accessioned | 2021-03-30T15:11:16Z | |
dc.date.available | 2021-03-30T15:11:16Z | |
dc.date.issued | 2020-01 | |
dc.date.submitted | 2019-06 | |
dc.identifier.isbn | 9781728132938 | |
dc.identifier.issn | 2575-7075 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/130277 | |
dc.description.abstract | How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/Youtube videos of people speaking. During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. We evaluate and numerically quantify how-and in what manner-our Speech2Face reconstructions, obtained directly from audio, resemble the true face images of the speakers. | en_US |
dc.language.iso | en | |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/cvpr.2019.00772 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | arXiv | en_US |
dc.title | Speech2Face: Learning the Face Behind a Voice | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Oh, Tae-Hyun et al. "Speech2Face: Learning the Face Behind a Voice." 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, Long Beach, California, Institute of Electrical and Electronics Engineers, January 2020. © 2019 IEEE | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.relation.journal | 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition | en_US |
dc.eprint.version | Original manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2021-02-05T18:37:36Z | |
dspace.orderedauthors | Oh, T-H; Dekel, T; Kim, C; Mosseri, I; Freeman, WT; Rubinstein, M; Matusik, W | en_US |
dspace.date.submission | 2021-02-05T18:37:42Z | |
mit.license | OPEN_ACCESS_POLICY | |
mit.metadata.status | Complete | |