Show simple item record

dc.contributor.advisorStephanie Seneff.en_US
dc.contributor.authorPeabody, Mitchell A. (Mitchell Aaron)en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2012-01-12T19:31:54Z
dc.date.available2012-01-12T19:31:54Z
dc.date.copyright2011en_US
dc.date.issued2011en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/68491
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 149-176).en_US
dc.description.abstractLearning a foreign language is a challenging endeavor that entails acquiring a wide range of new knowledge including words, grammar, gestures, sounds, etc. Mastering these skills all require extensive practice by the learner and opportunities may not always be available. Computer Aided Language Learning (CALL) systems provide non-threatening environments where foreign language skills can be practiced where ever and whenever a student desires. These systems often have several technologies to identify the different types of errors made by a student. This thesis focuses on the problem of identifying mispronunciations made by a foreign language student using a CALL system. We make several assumptions about the nature of the learning activity: it takes place using a dialogue system, it is a task- or game-oriented activity, the student should not be interrupted by the pronunciation feedback system, and that the goal of the feedback system is to identify severe mispronunciations with high reliability. Detecting mispronunciations requires a corpus of speech with human judgements of pronunciation quality. Typical approaches to collecting such a corpus use an expert phonetician to both phonetically transcribe and assign judgements of quality to each phone in a corpus. This is time consuming and expensive. It also places an extra burden on the transcriber. We describe a novel method for obtaining phone level judgements of pronunciation quality by utilizing non-expert, crowd-sourced, word level judgements of pronunciation. Foreign language learners typically exhibit high variation and pronunciation shapes distinct from native speakers that make analysis for mispronunciation difficult. We detail a simple, but effective method for transforming the vowel space of non-native speakers to make mispronunciation detection more robust and accurate. We show that this transformation not only enhances performance on a simple classification task, but also results in distributions that can be better exploited for mispronunciation detection. This transformation of the vowel is exploited to train a mispronunciation detector using a variety of features derived from acoustic model scores and vowel class distributions. We confirm that the transformation technique results in a more robust and accurate identification of mispronunciations than traditional acoustic models.en_US
dc.description.statementofresponsibilityby Mitchell A. Peabody.en_US
dc.format.extent176 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleMethods for pronunciation assessment in computer aided language learningen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc770405260en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record