Classification of vocal fold vibration as regular or irregular in normal, voiced speech
Author(s)
Surana, Kushan Krishna
DownloadFull printable version (5.191Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Janet Slifka.
Terms of use
Metadata
Show full item recordAbstract
Irregular phonation serves an important communicative function in human speech and occurs allophonically in American English. This thesis uses cues from both the temporal and frequency domains - such as fundamental frequency, normalized RMS amplitude, smoothed-energy-difference amplitude (a measure of abruptness in energy variations) and shift-difference amplitude (a measures of periodicity) -to classify segments of regular and irregular phonation in normal, continuous speech. Support Vector Machines (SVMs) are used to classify the tokens as examples of either regular or irregular phonation. The tokens are extracted from the TIMIT database, and are extracted from 151 different speakers. Both genders are well represented, and the tokens occur in various contexts within the utterance. The train-set uses 114 different speakers, while the test-set uses another 37 speakers. A total of 292 of 320 irregular tokens (recognition rate of 91.25% with a false alarm rate of 4.98%), and 4105 of 4320 regular tokens (recognition rate of 95.02% with a false alarm rate of 8.75%) are correctly identified. (cont.) The high recognition rates are an indicator that the set of acoustic cues are robust in accurately identifying a token as regular or irregular, even in cases where one or two acoustic cues show unexpected values. Also, analysis of irregular tokens in the training set (1331 irregular tokens) shows that 78% occur at word boundaries and 5% occur at syllable boundaries. Of the irregular tokens at syllable boundaries, 72% are either at the junction of a compound-word (e.g "outcast;") or at the junction of a base word and a suffix. Of the irregular tokens which do not occur at word or syllable boundaries, 70% occur adjacent to voiceless consonants mostly in utterance-final location. These observations support irregular phonation as a cue for syntactic boundaries in connected speech, and combined with the robust classification results to separate regular phonation from irregular phonation, could be used to improve speech recognition and lexical access models.
Description
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006. Includes bibliographical references (p. 91-97).
Date issued
2006Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.