Show simple item record

dc.contributor.authorOwens, Andrew Hale
dc.contributor.authorWu, Jiajun
dc.contributor.authorMcDermott, Joshua H.
dc.contributor.authorFreeman, William T.
dc.contributor.authorTorralba, Antonio
dc.date.accessioned2017-09-12T13:32:52Z
dc.date.available2017-09-12T13:32:52Z
dc.date.issued2016-09
dc.identifier.isbn978-3-319-46447-3
dc.identifier.isbn978-3-319-46448-0
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttp://hdl.handle.net/1721.1/111172
dc.description.abstractThe sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant 1524817)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant 1447476)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant 1212849)en_US
dc.language.isoen_US
dc.publisherSpringer-Verlagen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-319-46448-0_48en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleAmbient Sound Provides Supervision for Visual Learningen_US
dc.typeArticleen_US
dc.identifier.citationOwens, Andrew, et al. “Ambient Sound Provides Supervision for Visual Learning.” Lecture Notes in Computer Science 9905 (September 2016): 801–816. © 2016 Springer International Publishing AGen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Brain and Cognitive Sciencesen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorOwens, Andrew Hale
dc.contributor.mitauthorWu, Jiajun
dc.contributor.mitauthorMcDermott, Joshua H.
dc.contributor.mitauthorFreeman, William T.
dc.contributor.mitauthorTorralba, Antonio
dc.relation.journalLecture Notes in Computer Scienceen_US
dc.eprint.versionOriginal manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsOwens, Andrew; Wu, Jiajun; McDermott, Josh H.; Freeman, William T.; Torralba, Antonioen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0001-9020-9593
dc.identifier.orcidhttps://orcid.org/0000-0002-4176-343X
dc.identifier.orcidhttps://orcid.org/0000-0002-3965-2503
dc.identifier.orcidhttps://orcid.org/0000-0002-2231-7995
dc.identifier.orcidhttps://orcid.org/0000-0003-4915-0256
dspace.mitauthor.errortrue
mit.licenseOPEN_ACCESS_POLICYen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record