Uplift modeling with multiple treatments

Zhao, Yan, Ph. D. Massachusetts Institute of Technology

dc.contributor.advisor	David Simchi-Levi.	en_US
dc.contributor.author	Zhao, Yan, Ph. D. Massachusetts Institute of Technology	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2018-03-02T22:21:30Z
dc.date.available	2018-03-02T22:21:30Z
dc.date.copyright	2017	en_US
dc.date.issued	2017	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/113979
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 73-74).	en_US
dc.description.abstract	Randomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as Uplift Modeling in the literature. A key feature of uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because responses under alternative treatments are unobserved. This presents a unique challenge to both the training and the evaluation of uplift models. In this thesis, we present some critical development of the theory and the tools of Uplift Modeling that greatly extends its scope of applications. We describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. Our method applies to arbitrary number of treatments and general response types. It works even when the treatments are not evenly distributed which is often the case in practice. This new method not only is the first that gives unbiased estimate of uplift model performance, but also opens up uplift modeling to a much wider range of application scenarios. Based on the new evaluation method, we derive an uplift algorithm named Contextual Treatment Selection (CTS). CTS is a tree-based ensemble algorithm. The trees are built with a splitting criterion designed to directly optimize their uplift performance as measured on the training data. This idea is in line with the machine learning philosophy of loss minimization on the training set. As far as we are aware of, CTS is the first uplift algorithm can handle multiple treatments and continuous responses. Experiments on both synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods. When growing a tree, CTS conducts exhaustive searches to find the best splitting points. One drawback with this mechanism is its susceptibility to outliers. Splits are likely to be placed adjacent to extreme values, and successive splits tend to group together similar extreme values, which introduces bias into the estimation of leaf node responses. To solve the problem, we propose a modified version of CTS algorithm called Unbiased Contextual Treatment Selection (UCTS). The key difference is the separation between the partition of feature space and the estimation of leaf responses. Before growing a tree, UCTS randomly splits the training data into two subsets, one for selecting tree splits and the other for estimating treatment-wise expected response in the leaf nodes. Experiments show that UCTS is less affected by outliers and achieves comparable performance with CTS, if not better. The two-sample approach of UCTS also leads to more tractable theoretical analysis. We prove that, under mild regularity conditions, UCTS can achieve mean-square consistency by properly tuning the leaf size. To the extent of our knowledge, UCTS is the first uplift algorithm with provable consistency. The analysis provides helpful insights on the trade-off between approximation error and estimator error and on potential ways to further improve uplift algorithms.	en_US
dc.description.statementofresponsibility	by Yan Zhao.	en_US
dc.format.extent	74 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Uplift modeling with multiple treatments	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	1023628313	en_US

Files in this item

Name:: 1023628313-MIT.pdf
Size:: 6.692Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record