Show simple item record

dc.contributor.advisorAdam Albright.en_US
dc.contributor.authorMorita, Takashi, Ph. D. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Linguistics and Philosophy.en_US
dc.date.accessioned2019-03-01T19:34:06Z
dc.date.available2019-03-01T19:34:06Z
dc.date.copyright2018en_US
dc.date.issued2018en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/120612
dc.descriptionThesis: Ph. D. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2018.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 203-215).en_US
dc.description.abstractLanguages are constantly borrowing words from one another. Since the donor and recipient languages typically differ in their phonology and phonotactics, the native words and the loanwords of the borrower language can also exhibit dierent phonology/ phonotactics. Accordingly, it has been proposed that the phonotactics of languages such as Japanese is better explained if words are classified into etymologically defined sublexica. However, this sublexical analysis is challenged by a learnability problem: the sublexical membership of words is not directly observable. This study applies a state-of-the-art clustering method (a Dirichlet process mixture model) to a substantial number of Japanese and English words extracted from corpora. It turns out that the predicted clusters largely correspond to the etymologically defined sublexica. Since the clustering method is domain-general and not specialized to sublexicon identication, the results can be taken as statistical evidence for the heterogeneous lexica of the two languages. Moreover, the unsupervised nature of the clustering method demonstrates the learnability of sublexica from naturalistic data. The learned sublexica also replicate linguistic characterizations of actual sublexica proposed in previous literature, such as the biased distribution of (certain substrings of) segments to particular sublexica. In addition, the learned sublexica make informative predictions based on previous experimental studies. These results suggest that the predicted sublexica are linguistically sound. Finally, the predicted sublexica reveal hitherto unnoticed phonotactic properties. These discoveries can be used for further investigation of native speakers' knowledge.en_US
dc.description.statementofresponsibilityby Takashi Morita.en_US
dc.format.extent215 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectLinguistics and Philosophy.en_US
dc.titleUnsupervised learning of lexical subclasses from phonotacticsen_US
dc.typeThesisen_US
dc.description.degreePh. D. in Linguisticsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Linguistics and Philosophy.en_US
dc.identifier.oclc1088558202en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record