Applications of missing feature theory to speaker recognition
Author(s)Padilla, Michael Thomas, 1974-
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Thomas F. Quatieri.
MetadataShow full item record
An important problem in speaker recognition is the degradation that occurs when speaker models trained with speech from one type of channel are used to score speech from another type of channel, known as channel mismatch. This thesis investigates various channel compensation techniques and approaches from missing feature theory for improving Gaussian mixture model (GMM)-based speaker verification under this mismatch condition. Experiments are performed using a speech corpus consisting of "clean" training speech and "dirty" test speech equal to the clean speech corrupted by additive Gaussian noise. Channel compensation methods studied are cepstral mean subtraction, RASTA, and spectral subtraction. Approaches to missing feature theory include missing feature compensation, which removes corrupted features, and missing feature restoration which predicts such features from neighboring features in both frequency and time. These methods are investigated both individually and in combination. In particular, missing feature compensation combined with spectral subtraction in the discrete Fourier transform domain significantly improves GMM speaker verification accuracy and outperforms all other methods examined in this thesis, reducing the equal error rate by about 10% more than other methods over a SNR range of 5-25 dB. Moreover, this considerably outperforms a state-of-the-art GMM recognizer for the mismatch application that combines missing feature theory with spectral subtraction developed in a mel-filter energy domain. Finally, the concept of missing restoration is explored. A novel linear minimum mean-squared-error missing feature estimator is derived and applied to pure vowels as well as a clean/dirty verification trial. While it does not improve performance in the verification trial, a large SNR improvement for features estimated for the pure vowel case indicate promise in the application of this method.
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 100-101).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.