MIT Libraries homeMIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures

Author(s)
Rifkin, Ryan; Bouvrie, Jake; Schutte, Ken; Chikkerur, Sharat; Kouh, Minjoon; Ezzat, Tony; Poggio, Tomaso; ... Show more Show less
Thumbnail
DownloadMIT-CSAIL-TR-2007-007.ps (2212.Kb)
Additional downloads
Other Contributors
Center for Biological and Computational Learning (CBCL)
Advisor
Tomaso Poggio
Metadata
Show full item record
Abstract
A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis.
Date issued
2007-02-01
URI
http://hdl.handle.net/1721.1/35835
Other identifiers
MIT-CSAIL-TR-2007-007
CBCL-266
Series/Report no.
Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
Keywords
phonetic classification, hierarchical models, regularized least-squares, spectrotemporal patches

Collections
  • CSAIL Technical Reports (July 1, 2003 - present)

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries homeMIT Libraries logo

Find us on

Twitter Facebook Instagram YouTube RSS

MIT Libraries navigation

SearchHours & locationsBorrow & requestResearch supportAbout us
PrivacyPermissionsAccessibility
MIT
Massachusetts Institute of Technology
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.