Listening with generative models

Cusimano, Maddie

Author(s)

Cusimano, Maddie

DownloadThesis PDF (95.36Mb)

Advisor

McDermott, Josh H.

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

This thesis extends classic traditions in perception by leveraging contemporary tools to build and apply rich generative models that describe what we hear. First, I present a hierarchical Bayesian auditory scene synthesis model to address the perceptual organization of sound into sources and events. We aimed to bridge between classical auditory scene analysis phenomena and everyday sounds, asking whether common generative principles could explain auditory scene analysis in both cases. We tested the model by having it listen to a variety of auditory scene analysis illusions and found that its judgments matched those of human listeners. Applied to everyday sounds, the model infers valid perceptual organizations. Also, due to its interpretability, the model's failures with everyday sounds were informative: they reveal the necessity of peripheral representations of periodicity, a more expressive model of spectra, and sources that compose multiple sound-generating processes. The next projects address alternative scene analysis problems of everyday physical understanding from sound. We developed methods for the ecological sound synthesis of a set of common object interactions: brief impact sounds and sustained scraping and rolling sounds. Our synthesis combines physical simulation from perceptually relevant variables with a statistical model of material. Listeners perceive our synthesized sounds to be realistic and as conveying various physical variables. I discuss future directions for developing inference for these physics-inspired models, learning sound synthesizers, and generating illusions. Given the variety of structured latent-variable generative models investigated through these projects, I conclude by exploring how multiple world models might interact in perception.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/147570

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses