A regularization framework for active learning from imbalanced data

Paskov, Hristo Spassimirov

dc.contributor.advisor	Tomaso A. Poggio and Lorenzo A. Rosasco.	en_US
dc.contributor.author	Paskov, Hristo Spassimirov	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2011-02-23T14:24:51Z
dc.date.available	2011-02-23T14:24:51Z
dc.date.copyright	2010	en_US
dc.date.issued	2010	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/61177
dc.description	Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 81-83).	en_US
dc.description.abstract	We consider the problem of building a viable multiclass classification system that minimizes training data, is robust to noisy, imbalanced samples, and outputs confidence scores along with its predications. These goals address critical steps along the entire classification pipeline that pertain to collecting data, training, and classifying. To this end, we investigate the merits of a classification framework that uses a robust algorithm known as Regularized Least Squares (RLS) as its basic classifier. We extend RLS to account for data imbalances, perform efficient active learning, and output confidence scores. Each of these extensions is a new result that combines with our other findings to give an altogether novel and effective classification system. Our first set of results investigates various ways to handle multiclass data imbalances and ultimately leads to a derivation of a weighted version of RLS with and without an offset term. Weighting RLS provides an effective countermeasure to imbalanced data and facilitates the automatic selection of a regularization parameter through exact and efficient calculation of the Leave One Out error. Next, we present two methods that estimate multiclass confidence from an asymptotic analysis of RLS and another method that stems from a Bayesian interpretation of the classifier. We show that while the third method incorporates more information in its estimate, the asymptotic methods are more accurate and resilient to imperfect kernel and regularization parameter choices. Finally, we present an active learning extension of RLS (ARLS) that uses our weighting methods to overcome imbalanced data. ARLS is particularly adept to this task because of its intelligent selection scheme.	en_US
dc.description.statementofresponsibility	by Hristo Spassimirov Paskov.	en_US
dc.format.extent	83 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	A regularization framework for active learning from imbalanced data	en_US
dc.title.alternative	Multiclass extensions of Regularized Least Squares	en_US
dc.type	Thesis	en_US
dc.description.degree	M.Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	699803074	en_US

Files in this item

Name:: 699803074-MIT.pdf
Size:: 6.283Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record