dc.contributor.advisor | Tommi Jakkola. | en_US |
dc.contributor.author | Corduneanu, Adrian (Adrian Dumitru), 1977- | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2007-07-18T13:10:42Z | |
dc.date.available | 2007-07-18T13:10:42Z | |
dc.date.copyright | 2006 | en_US |
dc.date.issued | 2006 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/37917 | |
dc.description | Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006. | en_US |
dc.description | Includes bibliographical references (p. 147-154). | en_US |
dc.description.abstract | In recent years, the study of classification shifted to algorithms for training the classifier from data that may be missing the class label. While traditional supervised classifiers already have the ability to cope with some incomplete data, the new type of classifiers do not view unlabeled data as an anomaly, and can learn from data sets in which the large majority of training points are unlabeled. Classification with labeled and unlabeled data, or semi-supervised classification, has important practical significance, as training sets with a mix of labeled an unlabeled data are commonplace. In many domains, such as categorization of web pages, it is easier to collect unlabeled data, than to annotate the training points with labels. This thesis is a study of the information regularization method for semi-supervised classification, a unified framework that encompasses many of the common approaches to semi-supervised learning, including parametric models of incomplete data, harmonic graph regularization, redundancy of sufficient features (co-training), and combinations of these principles in a single algorithm. | en_US |
dc.description.abstract | (cont.) We discuss the framework in both parametric and non-parametric settings, as a transductive or inductive classifier, considered as a stand-alone classifier, or applied as post-processing to standard supervised classifiers. We study theoretical properties of the framework, and illustrate it on categorization of web pages, and named-entity recognition. | en_US |
dc.description.statementofresponsibility | by Adrian Corduneanu. | en_US |
dc.format.extent | 154 p. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | The information regularization framework for semi-supervised learning | en_US |
dc.type | Thesis | en_US |
dc.description.degree | Ph.D. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 135235565 | en_US |