Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

Rifkin, Ryan; Bouvrie, Jake; Schutte, Ken; Chikkerur, Sharat; Kouh, Minjoon; Ezzat, Tony; Poggio, Tomaso

Author(s)

Rifkin, Ryan; Bouvrie, Jake; Schutte, Ken; Chikkerur, Sharat; Kouh, Minjoon; ... Show more

DownloadMIT-CSAIL-TR-2007-007.ps (2212.Kb)

Additional downloads

Other Contributors

Center for Biological and Computational Learning (CBCL)

Advisor

Tomaso Poggio

Metadata

Show full item record

Abstract

A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis.

Date issued

2007-02-01

URI

http://hdl.handle.net/1721.1/35835

Other identifiers

MIT-CSAIL-TR-2007-007

CBCL-266

Series/Report no.

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Keywords

phonetic classification, hierarchical models, regularized least-squares, spectrotemporal patches

Collections

CSAIL Technical Reports (July 1, 2003 - present)