Optimization under ecological realism reproduces signatures of human speech perception

Magaro, Annika K.

Author(s)

Magaro, Annika K.

DownloadThesis PDF (1.358Mb)

Advisor

McDermott, Josh H.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Recent advances in machine learning have made real-world perception tasks feasible for computers, in many cases approaching levels of performance similar to those of humans. In particular, optimizing models for ecologically realistic training datasets has helped to yield more human-like model results. In the field of speech recognition, models trained under realistic conditions with simulated cochlear input reproduce some characteristics of human speech recognition. However, it is unclear how similar the behavior of these models is to that of humans across the many ways in which speech can be manipulated or degraded, since human and model behavior have not been extensively compared. In this paper, we address this question by comprehensively testing a neural network model trained in ecological conditions across a large set of speech manipulations, comparing its behavior to that of humans. We find that training in ecological conditions yields a fairly good overall match to human behavior, with some discrepancies that can be largely resolved by training specifically on these conditions. The results support the idea that the phenotype of human speech recognition can be understood as a consequence of having been optimized for the problem of speech recognition in natural conditions.

Date issued

2024-09

URI

https://hdl.handle.net/1721.1/157565

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses