dc.contributor.advisor | Tomaso Poggio | |
dc.contributor.author | Rifkin, Ryan | |
dc.contributor.author | Bouvrie, Jake | |
dc.contributor.author | Schutte, Ken | |
dc.contributor.author | Chikkerur, Sharat | |
dc.contributor.author | Kouh, Minjoon | |
dc.contributor.author | Ezzat, Tony | |
dc.contributor.author | Poggio, Tomaso | |
dc.contributor.other | Center for Biological and Computational Learning (CBCL) | |
dc.date.accessioned | 2007-03-22T11:21:47Z | |
dc.date.available | 2007-03-22T11:21:47Z | |
dc.date.issued | 2007-03-21 | |
dc.identifier.other | MIT-CSAIL-TR-2007-019 | |
dc.identifier.other | CBCL-267 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/36865 | |
dc.description.abstract | A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-classTIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phonetic analysis. | |
dc.format.extent | 17 p. | |
dc.relation.ispartofseries | Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory | |
dc.relation.replaces | http://hdl.handle.net/1721.1/35835 | |
dc.relation.uri | http://hdl.handle.net/1721.1/35835 | |
dc.subject | phonetic classification | |
dc.subject | hierarchical models | |
dc.subject | regularized least-squares | |
dc.subject | spectrotemporal patches | |
dc.title | Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures | |