Show simple item record

dc.contributor.advisorDorothy Curtis and Slav Petrov.en_US
dc.contributor.authorLin, Yuri, M. Eng. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2013-03-01T15:12:48Z
dc.date.available2013-03-01T15:12:48Z
dc.date.copyright2012en_US
dc.date.issued2012en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/77501
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 101-102).en_US
dc.description.abstractIn this thesis, we present a new edition of the Google Books Ngram Corpus, describing how often words and phrases were used over a period of five centuries, in eight languages; it aggregates data from 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and head-modifier dependency relationships are recorded. We generate these annotations automatically from the Google Books text, using statistical models that are specifically adapted to the historical text found in these books. The new edition will facilitate the study of linguistic trends, especially those related to the evolution of syntax. We present our initial findings from the annotated Ngrams in the new edition, including studies of the change in various words' primary parts of speech over time, and to find the words most closely related to a given set of topics.en_US
dc.description.statementofresponsibilityby Yuri Lin.en_US
dc.format.extent102 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleSyntactically annotated Ngrams for Google Booksen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc827733492en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record