MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Speech2Face: Learning the Face Behind a Voice

Author(s)
Oh, Taehyun; Dekel, Tali; Kim, Changil; Mosseri, Inbar; Freeman, William T; Rubinstein, Michael; Matusik, Wojciech; ... Show more Show less
Thumbnail
DownloadSubmitted version (5.054Mb)
Open Access Policy

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/Youtube videos of people speaking. During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. We evaluate and numerically quantify how-and in what manner-our Speech2Face reconstructions, obtained directly from audio, resemble the true face images of the speakers.
Date issued
2020-01
URI
https://hdl.handle.net/1721.1/130277
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Oh, Tae-Hyun et al. "Speech2Face: Learning the Face Behind a Voice." 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, Long Beach, California, Institute of Electrical and Electronics Engineers, January 2020. © 2019 IEEE
Version: Original manuscript
ISBN
9781728132938
ISSN
2575-7075

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.