The MIT Libraries is completing a major upgrade to DSpace@MIT.
Starting May 5 2026, DSpace will remain functional, viewable, searchable, and downloadable, however, you will not be able to edit existing collections or add new material.
We are aiming to have full functionality restored by May 18, 2026, but intermittent service interruptions may occur.
Please email dspace-lib@mit.edu with any questions.
Thank you for your patience as we implement this important upgrade.
Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
| dc.contributor.advisor | Tomaso Poggio | |
| dc.contributor.author | Rifkin, Ryan | |
| dc.contributor.author | Bouvrie, Jake | |
| dc.contributor.author | Schutte, Ken | |
| dc.contributor.author | Chikkerur, Sharat | |
| dc.contributor.author | Kouh, Minjoon | |
| dc.contributor.author | Ezzat, Tony | |
| dc.contributor.author | Poggio, Tomaso | |
| dc.contributor.other | Center for Biological and Computational Learning (CBCL) | |
| dc.date.accessioned | 2007-02-01T18:26:47Z | |
| dc.date.available | 2007-02-01T18:26:47Z | |
| dc.date.issued | 2007-02-01 | |
| dc.identifier.other | MIT-CSAIL-TR-2007-007 | |
| dc.identifier.other | CBCL-266 | |
| dc.identifier.uri | http://hdl.handle.net/1721.1/35835 | |
| dc.description.abstract | A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis. | |
| dc.format.extent | 16 p. | |
| dc.format.extent | 2265616 bytes | |
| dc.format.extent | 383591 bytes | |
| dc.format.mimetype | application/postscript | |
| dc.format.mimetype | application/pdf | |
| dc.language.iso | en_US | |
| dc.relation.ispartofseries | Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory | |
| dc.relation.isreplacedby | http://hdl.handle.net/1721.1/36865 | |
| dc.relation.uri | http://hdl.handle.net/1721.1/36865 | |
| dc.subject | phonetic classification | |
| dc.subject | hierarchical models | |
| dc.subject | regularized least-squares | |
| dc.subject | spectrotemporal patches | |
| dc.title | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures |
