The MIT Libraries is completing a major upgrade to DSpace@MIT. Starting May 5 2026, DSpace will remain functional, viewable, searchable, and downloadable, however, you will not be able to edit existing collections or add new material. We are aiming to have full functionality restored by May 18, 2026, but intermittent service interruptions may occur. Please email dspace-lib@mit.edu with any questions. Thank you for your patience as we implement this important upgrade.

Show simple item record

dc.contributor.advisorTomaso Poggio
dc.contributor.authorRifkin, Ryan
dc.contributor.authorBouvrie, Jake
dc.contributor.authorSchutte, Ken
dc.contributor.authorChikkerur, Sharat
dc.contributor.authorKouh, Minjoon
dc.contributor.authorEzzat, Tony
dc.contributor.authorPoggio, Tomaso
dc.contributor.otherCenter for Biological and Computational Learning (CBCL)
dc.date.accessioned2007-02-01T18:26:47Z
dc.date.available2007-02-01T18:26:47Z
dc.date.issued2007-02-01
dc.identifier.otherMIT-CSAIL-TR-2007-007
dc.identifier.otherCBCL-266
dc.identifier.urihttp://hdl.handle.net/1721.1/35835
dc.description.abstractA preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis.
dc.format.extent16 p.
dc.format.extent2265616 bytes
dc.format.extent383591 bytes
dc.format.mimetypeapplication/postscript
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.relation.ispartofseriesMassachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
dc.relation.isreplacedbyhttp://hdl.handle.net/1721.1/36865
dc.relation.urihttp://hdl.handle.net/1721.1/36865
dc.subjectphonetic classification
dc.subjecthierarchical models
dc.subjectregularized least-squares
dc.subjectspectrotemporal patches
dc.titlePhonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record