Characterizing phonetic transformations and fine-grained acoustic differences across dialects
Author(s)
Chen, Nancy Fang-Yih
DownloadFull printable version (18.11Mb)
Other Contributors
Harvard University--MIT Division of Health Sciences and Technology.
Advisor
Joseph P. Campbell.
Terms of use
Metadata
Show full item recordAbstract
This thesis is motivated by the gaps between speech science and technology in analyzing dialects. In speech science, investigating phonetic rules is usually manually laborious and time consuming, limiting the amount of data analyzed. Without sufficient data, the analysis could potentially overlook or over-specify certain phonetic rules. On the other hand, in speech technology such as automatic dialect recognition, phonetic rules are rarely modeled explicitly. While many applications do not require such knowledge to obtain good performance, it is beneficial to specifically model pronunciation patterns in certain applications. For example, users of language learning software can benefit from explicit and intuitive feedback from the computer to alter their pronunciation; in forensic phonetics, it is important that results of automated systems are justifiable on phonetic grounds. In this work, we propose a mathematical framework to analyze dialects in terms of (1) phonetic transformations and (2) acoustic differences. The proposed Phonetic based Pronunciation Model (PPM) uses a hidden Markov model to characterize when and how often substitutions, insertions, and deletions occur. In particular, clustering methods are compared to better model deletion transformations. In addition, an acoustic counterpart of PPM, Acoustic-based Pronunciation Model (APM), is proposed to characterize and locate fine-grained acoustic differences such as formant transitions and nasalization across dialects. We used three data sets to empirically compare the proposed models in Arabic and English dialects. Results in automatic dialect recognition demonstrate that the proposed models complement standard baseline systems. Results in pronunciation generation and rule retrieval experiments indicate that the proposed models learn underlying phonetic rules across dialects. Our proposed system postulates pronunciation rules to a phonetician who interprets and refines them to discover new rules or quantify known rules. This can be done on large corpora to develop rules of greater statistical significance than has previously been possible. Potential applications of this work include speaker characterization and recognition, automatic dialect recognition, automatic speech recognition and synthesis, forensic phonetics, language learning or accent training education, and assistive diagnosis tools for speech and voice disorders.
Description
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2011. Cataloged from PDF version of thesis. Includes bibliographical references (p. 169-175).
Date issued
2011Department
Harvard University--MIT Division of Health Sciences and TechnologyPublisher
Massachusetts Institute of Technology
Keywords
Harvard University--MIT Division of Health Sciences and Technology.