Statistical learning for decision making : interpretability, uncertainty, and inference

Letham, Benjamin

dc.contributor.advisor	Cynthia Rudin.	en_US
dc.contributor.author	Letham, Benjamin	en_US
dc.contributor.other	Massachusetts Institute of Technology. Operations Research Center.	en_US
dc.date.accessioned	2015-09-17T17:43:07Z
dc.date.available	2015-09-17T17:43:07Z
dc.date.copyright	2015	en_US
dc.date.issued	2015	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/98569
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2015.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 183-196).	en_US
dc.description.abstract	Data and predictive modeling are an increasingly important part of decision making. Here we present advances in several areas of statistical learning that are important for gaining insight from large amounts of data, and ultimately using predictive models to make better decisions. The first part of the thesis develops methods and theory for constructing interpretable models from association rules. Interpretability is important for decision makers to understand why a prediction is made. First we show how linear mixtures of rules can be used to make sequential predictions. Then we develop Bayesian Rule Lists, a method for learning small, ordered lists of rules. We apply Bayesian Rule Lists to a large database of patient medical histories and produce a simple, interpretable model that solves an important problem in healthcare, with little sacrifice to accuracy. Finally, we prove a uniform generalization bound for decision lists. In the second part of the thesis we focus on decision making from sales transaction data. We develop models and inference procedures for using transaction data to estimate quantities such as willingness-to-pay and lost sales due to stock unavailability. We develop a copula estimation procedure for making optimal bundle pricing decisions. We then develop a Bayesian hierarchical model for inferring demand and substitution behaviors from transaction data with stockouts. We show how posterior sampling can be used to directly incorporate model uncertainty into the decisions that will be made using the model. In the third part of the thesis we propose a method for aggregating relevant information from across the Internet to facilitate informed decision making. Our contributions here include an important theoretical result for Bayesian Sets, a popular method for identifying data that are similar to seed examples. We provide a generalization bound that holds for any data distribution, and moreover is independent of the dimensionality of the feature space. This result justifies the use of Bayesian Sets on high-dimensional problems, and also explains its good performance in settings where its underlying independence assumption does not hold.	en_US
dc.description.statementofresponsibility	by Benjamin Letham.	en_US
dc.format.extent	196 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Operations Research Center.	en_US
dc.title	Statistical learning for decision making : interpretability, uncertainty, and inference	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Operations Research Center
dc.contributor.department	Sloan School of Management
dc.identifier.oclc	920866974	en_US

Files in this item

Name:: 920866974-MIT.pdf
Size:: 2.113Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record