Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

Rifkin, Ryan; Bouvrie, Jake; Schutte, Ken; Chikkerur, Sharat; Kouh, Minjoon; Ezzat, Tony; Poggio, Tomaso

The MIT Libraries is completing a major upgrade to DSpace@MIT. Starting May 5 2026, DSpace will remain functional, viewable, searchable, and downloadable, however, you will not be able to edit existing collections or add new material. We are aiming to have full functionality restored by May 18, 2026, but intermittent service interruptions may occur. Please email dspace-lib@mit.edu with any questions. Thank you for your patience as we implement this important upgrade.

Show simple item record

dc.contributor.advisor	Tomaso Poggio
dc.contributor.author	Rifkin, Ryan
dc.contributor.author	Bouvrie, Jake
dc.contributor.author	Schutte, Ken
dc.contributor.author	Chikkerur, Sharat
dc.contributor.author	Kouh, Minjoon
dc.contributor.author	Ezzat, Tony
dc.contributor.author	Poggio, Tomaso
dc.contributor.other	Center for Biological and Computational Learning (CBCL)
dc.date.accessioned	2007-02-01T18:26:47Z
dc.date.available	2007-02-01T18:26:47Z
dc.date.issued	2007-02-01
dc.identifier.other	MIT-CSAIL-TR-2007-007
dc.identifier.other	CBCL-266
dc.identifier.uri	http://hdl.handle.net/1721.1/35835
dc.description.abstract	A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis.
dc.format.extent	16 p.
dc.format.extent	2265616 bytes
dc.format.extent	383591 bytes
dc.format.mimetype	application/postscript
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.relation.ispartofseries	Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
dc.relation.isreplacedby	http://hdl.handle.net/1721.1/36865
dc.relation.uri	http://hdl.handle.net/1721.1/36865
dc.subject	phonetic classification
dc.subject	hierarchical models
dc.subject	regularized least-squares
dc.subject	spectrotemporal patches
dc.title	Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

Files in this item

Name:: MIT-CSAIL-TR-2007-007.ps
Size:: 2.160Mb
Format:: Postscript

View/Open

Name:: MIT-CSAIL-TR-2007-007.pdf
Size:: 374.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

CSAIL Technical Reports (July 1, 2003 - present)

Show simple item record