A Computational Model for Combinatorial Generalization in Physical Perception from Sound

Wang, Yunyun; Gan, Chuang; Siegel, Max Harmon; Zhang, Zhoutong; Wi, Jiajun; Tenenbaum, Joshua B

Author(s)

Wang, Yunyun; Gan, Chuang; Siegel, Max Harmon; Zhang, Zhoutong; Wi, Jiajun; ... Show more

DownloadPublished version (1.445Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution 3.0 unported license https://creativecommons.org/licenses/by/3.0/

Metadata

Show full item record

Abstract

Humans possess the unique ability of combinatorial generalization in auditory perception: given novel auditory stimuli, humans perform auditory scene analysis and infer causal physical interactions based on prior knowledge. Could we build a computational model that achieves human-like combinatorial generalization? In this paper, we present a case study on box-shaking: having heard only the sound of a single ball moving in a box, we seek to interpret the sound of two or three balls of different materials. To solve this task, we propose a hybrid model with two components: a neural network for perception, and a physical audio engine for simulation. We use the outcome of the network as an initial guess and perform MCMC sampling with the audio engine to improve the result. Combining neural networks with a physical audio engine, our hybrid model achieves combinatorial generalization efficiently and accurately in auditory scene perception.

Date issued

2019

URI

https://hdl.handle.net/1721.1/138340.2

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; MIT-IBM Watson AI Lab; Center for Brains, Minds, and Machines

Journal

2019 Conference on Cognitive Computational Neuroscience

Publisher

Cognitive Computational Neuroscience

Citation

Wang, Yunyun, Gan, Chuang, Siegel, Max, Zhang, Zhoutong, Wu, Jiajun et al. 2019. "A Computational Model for Combinatorial Generalization in Physical Perception from Sound." 2019 Conference on Cognitive Computational Neuroscience.

Version: Final published version

Collections

MIT Open Access Articles

Version	Item	Date	Summary
2	1721.1/138340.2*	2021-12-10T21:00:31Z	Verified or entered authority metadata.
1	1721.1/138340	2021-12-07T13:44:38Z

DSpace@MIT