MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

Author(s)
Kondacs, Attila
Thumbnail
DownloadMIT-CSAIL-TR-2005-005.ps (81.70Mb)
Additional downloads
Metadata
Show full item record
Abstract
In this thesis I will be concerned with linking the observed speechsignal to the configuration of articulators.Due to the potentially rapid motion of the articulators, the speechsignal can be highly non-stationary. The typical linear analysistechniques that assume quasi-stationarity may not have sufficienttime-frequency resolution to determine the place of articulation.I argue that the traditional low and high-level primitives of speechprocessing, frequency and phonemes, are inadequate and should bereplaced by a representation with three layers: 1. short pitch periodresonances and other spatio-temporal patterns 2. articulatorconfiguration trajectories 3. syllables. The patterns indicatearticulator configuration trajectories (how the tongue, jaws, etc. aremoving), which are interpreted as syllables and words.My patterns are an alternative to frequency. I use shorttime-domain features of the sound waveform, which can be extractedfrom each vowel pitch period pattern, to identify the positions of thearticulators with high reliability. These features are importantbecause by capitalizing on detailed measurements within a single pitchperiod, the rapid articulator movements can be tracked. No linearsignal processing approach can achieve the combination of sensitivityto short term changes and measurement accuracy resulting from thesenonlinear techniques.The measurements I use are neurophysiologically plausible: theauditory system could be using similar methods.I have demonstrated this approach by constructing a robust techniquefor categorizing the English voiced stops as the consonants B, D, or Gbased on the vocalic portions of their releases. The classificationrecognizes 93.5%, 81.8% and 86.1% of the b, d and gto ae transitions with false positive rates 2.9%, 8.7% and2.6% respectively.
Date issued
2005-01-28
URI
http://hdl.handle.net/1721.1/30518
Other identifiers
MIT-CSAIL-TR-2005-005
AITR-2005-001
Series/Report no.
Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
Keywords
AI, speech processing, stop consonants, pitch period, spatio-temporal patterns

Collections
  • CSAIL Technical Reports (July 1, 2003 - present)

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.