MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Listening with generative models

Author(s)
Cusimano, Maddie
Thumbnail
DownloadThesis PDF (95.36Mb)
Advisor
McDermott, Josh H.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
This thesis extends classic traditions in perception by leveraging contemporary tools to build and apply rich generative models that describe what we hear. First, I present a hierarchical Bayesian auditory scene synthesis model to address the perceptual organization of sound into sources and events. We aimed to bridge between classical auditory scene analysis phenomena and everyday sounds, asking whether common generative principles could explain auditory scene analysis in both cases. We tested the model by having it listen to a variety of auditory scene analysis illusions and found that its judgments matched those of human listeners. Applied to everyday sounds, the model infers valid perceptual organizations. Also, due to its interpretability, the model's failures with everyday sounds were informative: they reveal the necessity of peripheral representations of periodicity, a more expressive model of spectra, and sources that compose multiple sound-generating processes. The next projects address alternative scene analysis problems of everyday physical understanding from sound. We developed methods for the ecological sound synthesis of a set of common object interactions: brief impact sounds and sustained scraping and rolling sounds. Our synthesis combines physical simulation from perceptually relevant variables with a statistical model of material. Listeners perceive our synthesized sounds to be realistic and as conveying various physical variables. I discuss future directions for developing inference for these physics-inspired models, learning sound synthesizers, and generating illusions. Given the variety of structured latent-variable generative models investigated through these projects, I conclude by exploring how multiple world models might interact in perception.
Date issued
2022-09
URI
https://hdl.handle.net/1721.1/147570
Department
Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.