A language-based approach to categorical analysis
Author(s)
Marlow, Cameron Alexander, 1977-
DownloadFull printable version (9.484Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Architecture. Program In Media Arts and Sciences.
Advisor
Walter Bender.
Terms of use
Metadata
Show full item recordAbstract
With the digitization of media, computers can be employed to help us with the process of classification, both by learning from our behavior to perform the task for us and by exposing new ways for us to think about our information. Given that most of our media comes in the form of electronic text, research in this area focuses on building automatic text classification systems. The standard representation employed by these systems, known as the bag-of-words approach to information retrieval, represents documents as collections of words. As a byproduct of this model, automatic classifiers have difficulty distinguishing between different meanings of a single word. This research presents a new computational model of electronic text, called a synchronic imprint, which uses structural information to contextualize the meaning of words. Every concept in the body of a text is described by its relationships with other concepts in the same text, allowing classification systems to distinguish between alternative meanings of the same word. This representation is applied to both the standard problem of text classification and also to the task of enabling people to better identify large bodies of text. The latter is achieved through the development of a visualization tool named flux that models synchronic imprints as a spring network.
Description
Thesis (S.M.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 2001. Includes bibliographical references (p. 79-81).
Date issued
2001Department
Program in Media Arts and Sciences (Massachusetts Institute of Technology)Publisher
Massachusetts Institute of Technology
Keywords
Architecture. Program In Media Arts and Sciences.