Inferring Shape and Material from Sound

Zhang, Zhoutong

Author(s)

Zhang, Zhoutong

DownloadThesis PDF (11.30Mb)

Advisor

Freeman, William T.

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging. One possible solution is to rely on supervised learning, which requires a large-scale dataset containing sounds of various objects, with clean labels on their appearances, shape and material. However, it is difficult and expensive to capture such a dataset. Another approach is to tackle the problem in an analysis-by-synthesis framework, where we iterative update current estimates given a generative model. This, however, requires sophisticated generative models, which is too computationally expensive to support iterative inference. Finally, despite the popularity of deep learning methods in auditory perception tasks, most of them are derived from visual recognition tasks, which may not be suitable for processing audios. To address such difficulties, we first present a novel, open-source pipeline that generates audio-visual data, purely from 3D object shapes and their physical properties. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We further demonstrate that the representation learned on synthetic audio-visual data can transfer to real-world scenarios. In addition, the generative model can be made efficient enough to support iterative inference, where we construct an analysis-by-synthesis framework that infers object’s shape and material by hearing it falling on the ground.

Date issued

2021-06

URI

https://hdl.handle.net/1721.1/139579

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses