dc.contributor.advisor | Dorothy Curtis and Slav Petrov. | en_US |
dc.contributor.author | Lin, Yuri, M. Eng. Massachusetts Institute of Technology | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2013-03-01T15:12:48Z | |
dc.date.available | 2013-03-01T15:12:48Z | |
dc.date.copyright | 2012 | en_US |
dc.date.issued | 2012 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/77501 | |
dc.description | Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012. | en_US |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (p. 101-102). | en_US |
dc.description.abstract | In this thesis, we present a new edition of the Google Books Ngram Corpus, describing how often words and phrases were used over a period of five centuries, in eight languages; it aggregates data from 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and head-modifier dependency relationships are recorded. We generate these annotations automatically from the Google Books text, using statistical models that are specifically adapted to the historical text found in these books. The new edition will facilitate the study of linguistic trends, especially those related to the evolution of syntax. We present our initial findings from the annotated Ngrams in the new edition, including studies of the change in various words' primary parts of speech over time, and to find the words most closely related to a given set of topics. | en_US |
dc.description.statementofresponsibility | by Yuri Lin. | en_US |
dc.format.extent | 102 p. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by
copyright. They may be viewed from this source for any purpose, but
reproduction or distribution in any format is prohibited without written
permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Syntactically annotated Ngrams for Google Books | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M.Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 827733492 | en_US |