Show simple item record

dc.contributor.advisorMcDermott, Josh H.
dc.contributor.authorCusimano, Maddie
dc.date.accessioned2023-01-19T19:59:17Z
dc.date.available2023-01-19T19:59:17Z
dc.date.issued2022-09
dc.date.submitted2022-09-28T17:18:19.110Z
dc.identifier.urihttps://hdl.handle.net/1721.1/147570
dc.description.abstractThis thesis extends classic traditions in perception by leveraging contemporary tools to build and apply rich generative models that describe what we hear. First, I present a hierarchical Bayesian auditory scene synthesis model to address the perceptual organization of sound into sources and events. We aimed to bridge between classical auditory scene analysis phenomena and everyday sounds, asking whether common generative principles could explain auditory scene analysis in both cases. We tested the model by having it listen to a variety of auditory scene analysis illusions and found that its judgments matched those of human listeners. Applied to everyday sounds, the model infers valid perceptual organizations. Also, due to its interpretability, the model's failures with everyday sounds were informative: they reveal the necessity of peripheral representations of periodicity, a more expressive model of spectra, and sources that compose multiple sound-generating processes. The next projects address alternative scene analysis problems of everyday physical understanding from sound. We developed methods for the ecological sound synthesis of a set of common object interactions: brief impact sounds and sustained scraping and rolling sounds. Our synthesis combines physical simulation from perceptually relevant variables with a statistical model of material. Listeners perceive our synthesized sounds to be realistic and as conveying various physical variables. I discuss future directions for developing inference for these physics-inspired models, learning sound synthesizers, and generating illusions. Given the variety of structured latent-variable generative models investigated through these projects, I conclude by exploring how multiple world models might interact in perception.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleListening with generative models
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Brain and Cognitive Sciences
dc.identifier.orcidhttps://orcid.org/0000-0002-7435-2434
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record