Neighborhood analysis methods in acoustic modeling for automatic speech recognition
Author(s)Singh-Miller, Natasha, 1981-
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Michael J. Collins.
MetadataShow full item record
This thesis investigates the problem of using nearest-neighbor based non-parametric methods for performing multi-class class-conditional probability estimation. The methods developed are applied to the problem of acoustic modeling for speech recognition. Neighborhood components analysis (NCA) (Goldberger et al. ) serves as the departure point for this study. NCA is a non-parametric method that can be seen as providing two things: (1) low-dimensional linear projections of the feature space that allow nearest-neighbor algorithms to perform well, and (2) nearest-neighbor based class-conditional probability estimates. First, NCA is used to perform dimensionality reduction on acoustic vectors, a commonly addressed problem in speech recognition. NCA is shown to perform competitively with another commonly employed dimensionality reduction technique in speech known as heteroscedastic linear discriminant analysis (HLDA) (Kumar ). Second, a nearest neighbor-based model related to NCA is created to provide a class-conditional estimate that is sensitive to the possible underlying relationship between the acoustic-phonetic labels. An embedding of the labels is learned that can be used to estimate the similarity or confusability between labels. This embedding is related to the concept of error-correcting output codes (ECOC) and therefore the proposed model is referred to as NCA-ECOC. The estimates provided by this method along with nearest neighbor information is shown to provide improvements in speech recognition performance (2.5% relative reduction in word error rate). Third, a model for calculating class-conditional probability estimates is proposed that generalizes GMM, NCA, and kernel density approaches. This model, called locally-adaptive neighborhood components analysis, LA-NCA, learns different low-dimensional projections for different parts of the space. The models exploits the fact that in different parts of the space different directions may be important for discrimination between the classes. This model is computationally intensive and prone to over-fitting, so methods for sub-selecting neighbors used for providing the classconditional estimates are explored. The estimates provided by LA-NCA are shown to give significant gains in speech recognition performance (7-8% relative reduction in word error rate) as well as phonetic classification.
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 121-134).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.