MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Inferring Shape and Material from Sound

Author(s)
Zhang, Zhoutong
Thumbnail
DownloadThesis PDF (11.30Mb)
Advisor
Freeman, William T.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging. One possible solution is to rely on supervised learning, which requires a large-scale dataset containing sounds of various objects, with clean labels on their appearances, shape and material. However, it is difficult and expensive to capture such a dataset. Another approach is to tackle the problem in an analysis-by-synthesis framework, where we iterative update current estimates given a generative model. This, however, requires sophisticated generative models, which is too computationally expensive to support iterative inference. Finally, despite the popularity of deep learning methods in auditory perception tasks, most of them are derived from visual recognition tasks, which may not be suitable for processing audios. To address such difficulties, we first present a novel, open-source pipeline that generates audio-visual data, purely from 3D object shapes and their physical properties. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We further demonstrate that the representation learned on synthetic audio-visual data can transfer to real-world scenarios. In addition, the generative model can be made efficient enough to support iterative inference, where we construct an analysis-by-synthesis framework that infers object’s shape and material by hearing it falling on the ground.
Date issued
2021-06
URI
https://hdl.handle.net/1721.1/139579
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.