Intelligible models for learning categorical data via generalized fourier spectrum

Zhang, Xuhong,Ph. D.Massachusetts Institute of Technology.

dc.contributor.advisor	Gregory W. Wornell.	en_US
dc.contributor.author	Zhang, Xuhong,Ph. D.Massachusetts Institute of Technology.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2019-07-17T20:58:03Z
dc.date.available	2019-07-17T20:58:03Z
dc.date.copyright	2019	en_US
dc.date.issued	2019	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/121725
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 167-170).	en_US
dc.description.abstract	Machine learning techniques have found ubiquitous applications in recent years and sophisticated models such as neural networks and ensemble methods have achieved impressive predictive performances. However, these models are hard to interpret and usually used as a blackbox. In applications where an explanation is required in addition to a prediction, linear models (e.g. Linear Regression or Logistic Regression) remain to be mainstream tools due to their simplicity and good interpretability. This thesis considers learning problems on categorical data and proposes methods that retain the good interpretability of linear models but significantly improve the predictive performance. In particular, we provide ways to automatically generate and efficiently select new features based on the raw data, and then train a linear model in the new feature space. The proposed methods are inspired by the Boolean function analysis literature, which studies the Fourier spectrum of Boolean functions and in turn provides spectrum-based learning algorithms. Such algorithms are important tools in computational learning theory, but not considered practically useful due to the unrealistic assumption of uniform input distribution. This work generalizes the idea of Fourier spectrum of Boolean functions to allow arbitrary input distribution. The generalized Fourier spectrum is also of theoretical interest. It carries over and meaningfully generalizes many important results of Fourier spectrum. Moreover, it offers a framework to explore how the input distribution and target function jointly affect the difficulty of a learning problem, and provides the right language for discussing data-dependent, algorithm-independent complexity of Boolean functions.	en_US
dc.description.statementofresponsibility	by Xuhong Zhang.	en_US
dc.format.extent	170 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Intelligible models for learning categorical data via generalized fourier spectrum	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1102048968	en_US
dc.description.collection	Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2019-07-17T20:58:00Z	en_US
mit.thesis.degree	Doctoral	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1102048968-MIT.pdf
Size:: 15.35Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record