Recovery of adjective hierarchy through unsupervised learning
Name
1192539711-MIT.pdf
Size
254.99 KB
Format
Adobe PDF
Checksum (MD5)
6f39384910470e475bc7d34e2628d845
Author(s)
Chen, Run,M. Eng.Massachusetts Institute of Technology.
Advisor(s)
Robert C. Berwick.
Date Issued
2020
Publisher
Massachusetts Institute of Technology
Abstract
To understand the cognitive processes for natural language acquisition, we must differentiate between prior and acquired knowledge of language. We take steps towards identifying some of this prior knowledge by applying a computational approach to the Cartographic Hypothesis, a linguistic hypothesis that postulates a universal hierarchical syntactic structure for adverb and adjective sequences such that we prefer "little black (purse)" (169/169) over "black little (purse)" (0/169). Specifically, the adjectives are clustered and ordered. We consider English adjective bigrams in the Google Books Ngram corpus and attempt to recover the clusters, or syntactic groups of adjectives, based on relative order frequencies through unsupervised learning models. Low accuracy in the clustering results (0.45) strongly implies the information in the corpus is insufficient for speakers to acquire the linguistic intuition, and that the mechanisms needed to learn these syntactic structures may be prenatal as opposed to gleaned from the statistical regularity of the adjectives themselves.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020
Cataloged from the official PDF of thesis.
Includes bibliographical references (pages 29-30).
Subjects
Electrical Engineering and Computer Science.
MIT Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Terms of Use
MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.
Persistent DSpace Link