Recovery of adjective hierarchy through unsupervised learning
Author(s)Chen, Run,M. Eng.Massachusetts Institute of Technology.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Robert C. Berwick.
MetadataShow full item record
To understand the cognitive processes for natural language acquisition, we must differentiate between prior and acquired knowledge of language. We take steps towards identifying some of this prior knowledge by applying a computational approach to the Cartographic Hypothesis, a linguistic hypothesis that postulates a universal hierarchical syntactic structure for adverb and adjective sequences such that we prefer "little black (purse)" (169/169) over "black little (purse)" (0/169). Specifically, the adjectives are clustered and ordered. We consider English adjective bigrams in the Google Books Ngram corpus and attempt to recover the clusters, or syntactic groups of adjectives, based on relative order frequencies through unsupervised learning models. Low accuracy in the clustering results (0.45) strongly implies the information in the corpus is insufficient for speakers to acquire the linguistic intuition, and that the mechanisms needed to learn these syntactic structures may be prenatal as opposed to gleaned from the statistical regularity of the adjectives themselves.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 29-30).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.