Listening with generative models

Cusimano, Maddie

dc.contributor.advisor	McDermott, Josh H.
dc.contributor.author	Cusimano, Maddie
dc.date.accessioned	2023-01-19T19:59:17Z
dc.date.available	2023-01-19T19:59:17Z
dc.date.issued	2022-09
dc.date.submitted	2022-09-28T17:18:19.110Z
dc.identifier.uri	https://hdl.handle.net/1721.1/147570
dc.description.abstract	This thesis extends classic traditions in perception by leveraging contemporary tools to build and apply rich generative models that describe what we hear. First, I present a hierarchical Bayesian auditory scene synthesis model to address the perceptual organization of sound into sources and events. We aimed to bridge between classical auditory scene analysis phenomena and everyday sounds, asking whether common generative principles could explain auditory scene analysis in both cases. We tested the model by having it listen to a variety of auditory scene analysis illusions and found that its judgments matched those of human listeners. Applied to everyday sounds, the model infers valid perceptual organizations. Also, due to its interpretability, the model's failures with everyday sounds were informative: they reveal the necessity of peripheral representations of periodicity, a more expressive model of spectra, and sources that compose multiple sound-generating processes. The next projects address alternative scene analysis problems of everyday physical understanding from sound. We developed methods for the ecological sound synthesis of a set of common object interactions: brief impact sounds and sustained scraping and rolling sounds. Our synthesis combines physical simulation from perceptually relevant variables with a statistical model of material. Listeners perceive our synthesized sounds to be realistic and as conveying various physical variables. I discuss future directions for developing inference for these physics-inspired models, learning sound synthesizers, and generating illusions. Given the variety of structured latent-variable generative models investigated through these projects, I conclude by exploring how multiple world models might interact in perception.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Listening with generative models
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
dc.identifier.orcid	https://orcid.org/0000-0002-7435-2434
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: cusimano-mcusi-phd-bcs-2022-th ...
Size:: 95.36Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record