Ambient Sound Provides Supervision for Visual Learning
Author(s)
Owens, Andrew Hale; Wu, Jiajun; McDermott, Joshua H.; Freeman, William T.; Torralba, Antonio
DownloadAmbient sound provides.pdf (7.436Mb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.
Date issued
2016-09Department
Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Lecture Notes in Computer Science
Publisher
Springer-Verlag
Citation
Owens, Andrew, et al. “Ambient Sound Provides Supervision for Visual Learning.” Lecture Notes in Computer Science 9905 (September 2016): 801–816. © 2016 Springer International Publishing AG
Version: Original manuscript
ISBN
978-3-319-46447-3
978-3-319-46448-0
ISSN
0302-9743
1611-3349