Recovery of adjective hierarchy through unsupervised learning
Author(s)
Chen, Run,M. Eng.Massachusetts Institute of Technology.
Download1192539711-MIT.pdf (254.9Kb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Robert C. Berwick.
Terms of use
Metadata
Show full item recordAbstract
To understand the cognitive processes for natural language acquisition, we must differentiate between prior and acquired knowledge of language. We take steps towards identifying some of this prior knowledge by applying a computational approach to the Cartographic Hypothesis, a linguistic hypothesis that postulates a universal hierarchical syntactic structure for adverb and adjective sequences such that we prefer "little black (purse)" (169/169) over "black little (purse)" (0/169). Specifically, the adjectives are clustered and ordered. We consider English adjective bigrams in the Google Books Ngram corpus and attempt to recover the clusters, or syntactic groups of adjectives, based on relative order frequencies through unsupervised learning models. Low accuracy in the clustering results (0.45) strongly implies the information in the corpus is insufficient for speakers to acquire the linguistic intuition, and that the mechanisms needed to learn these syntactic structures may be prenatal as opposed to gleaned from the statistical regularity of the adjectives themselves.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 29-30).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.