Login

Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

Show full item record




Title: Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods
Author: Kondacs, Attila, 1972-
Other Contributors: Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor: Gerald J. Sussman.
Department: Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Publisher: Massachusetts Institute of Technology
Issue Date: 2005
Abstract: In this thesis I will be concerned with linking the observed speech signal to the configuration of articulators. Due to the potentially rapid motion of the articulators, the speech signal can be highly non-stationary. The typical linear analysis techniques that assume quasi-stationarity may not have sufficient time-frequency resolution to determine the place of articulation. I argue that the traditional low and high-level primitives of speech processing, frequency and phonemes, are inadequate and should be replaced by a representation with three layers: 1. short pitch period resonances and other spatio-temporal patterns; 2. articulator configuration trajectories; 3. syllables. The patterns indicate articulator configuration trajectories (how the tongue, jaws, etc. are moving), which are interpreted as syllables and words. My patterns are an alternative to frequency. I use short time-domain features of the sound waveform, which can be extracted from each vowel pitch period pattern, to identify the positions of the articulators with high reliability. These features are important because by capitalizing on detailed measurements within a single pitch period, the rapid articulator movements can be tracked. No linear signal processing approach can achieve the combination of sensitivity to short term changes and measurement accuracy resulting from these nonlinear techniques. The measurements I use are neurophysiologically plausible: the auditory system could be using similar methods. I have demonstrated this approach by constructing a robust technique for categorizing the English voiced stops as the consonants B, D, or G based on the vocalic portions of their releases. The classification recognizes 93.5%, 81.8% and 86.1% of the b, d and g to ae transitions with false positive rates 2.9%, 8.7% and 2.6% respectively.
Description: Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 99-104).
URI: http://hdl.handle.net/1721.1/27867
Keywords: Electrical Engineering and Computer Science.

Files in this item

Files Size Format View Description
Preview, non-printable (open to all) 3.085Mb PDF View/Open Preview, non-printable (open to all)
Full printable version (MIT only) 3.113Mb PDF View/Open Full printable version (MIT only)

This item appears in the following Collection(s)

Show full item record

Search DSpace@MIT


Advanced Search

Browse

My Account

Links