Optimization under ecological realism reproduces signatures of human speech perception
Author(s)
Magaro, Annika K.
DownloadThesis PDF (1.358Mb)
Advisor
McDermott, Josh H.
Terms of use
Metadata
Show full item recordAbstract
Recent advances in machine learning have made real-world perception tasks feasible for computers, in many cases approaching levels of performance similar to those of humans. In particular, optimizing models for ecologically realistic training datasets has helped to yield more human-like model results. In the field of speech recognition, models trained under realistic conditions with simulated cochlear input reproduce some characteristics of human speech recognition. However, it is unclear how similar the behavior of these models is to that of humans across the many ways in which speech can be manipulated or degraded, since human and model behavior have not been extensively compared. In this paper, we address this question by comprehensively testing a neural network model trained in ecological conditions across a large set of speech manipulations, comparing its behavior to that of humans. We find that training in ecological conditions yields a fairly good overall match to human behavior, with some discrepancies that can be largely resolved by training specifically on these conditions. The results support the idea that the phenotype of human speech recognition can be understood as a consequence of having been optimized for the problem of speech recognition in natural conditions.
Date issued
2024-09Department
Massachusetts Institute of Technology. Department of Brain and Cognitive SciencesPublisher
Massachusetts Institute of Technology