MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Unsupervised syntactic category learning from child-directed speech

Author(s)
Wichrowska, Olga N
Thumbnail
DownloadFull printable version (3.534Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Robert C. Berwick.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
The goal of this research was to discover what kinds of syntactic categories can be learned using distributional analysis on linear context of words, specifically in child-directed speech. The idea behind this is that the categories used by children could very well be different from adult categories. There is some evidence that distributional analysis could be used for some aspects of language acquisition, though very strong arguments exist for why it is not enough to acquire grammar. These experiments can help identify what kind of data can be learned from linear context and statistics only. This paper reports the results of three established automatic syntactic category learning algorithms on a small, edited input set of child-directed speech from the CHILDES database. Hierarchical clustering, K-Means analysis, and an implementation of a substitution algorithm are all used to assign syntactic categories to words based on their linear distributional context. Overall, open classes (nouns, verbs, adjectives) were reliably categorized, and some methods were able to distinguish prepositions, adverbs, subjects vs. objects, and verbs by subcategorization frame. The main barrier standing between these methods and human-like categorization is the inability to deal with the ambiguity that is omnipresent in natural language and poses an important problem for future models of syntactic category acquisition.
Description
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (p. 57-59).
 
Date issued
2010
URI
http://hdl.handle.net/1721.1/62756
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.