Show simple item record

dc.contributor.advisorDavid Simchi-Levi.en_US
dc.contributor.authorZhao, Yan, Ph. D. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2018-03-02T22:21:30Z
dc.date.available2018-03-02T22:21:30Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/113979
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 73-74).en_US
dc.description.abstractRandomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as Uplift Modeling in the literature. A key feature of uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because responses under alternative treatments are unobserved. This presents a unique challenge to both the training and the evaluation of uplift models. In this thesis, we present some critical development of the theory and the tools of Uplift Modeling that greatly extends its scope of applications. We describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. Our method applies to arbitrary number of treatments and general response types. It works even when the treatments are not evenly distributed which is often the case in practice. This new method not only is the first that gives unbiased estimate of uplift model performance, but also opens up uplift modeling to a much wider range of application scenarios. Based on the new evaluation method, we derive an uplift algorithm named Contextual Treatment Selection (CTS). CTS is a tree-based ensemble algorithm. The trees are built with a splitting criterion designed to directly optimize their uplift performance as measured on the training data. This idea is in line with the machine learning philosophy of loss minimization on the training set. As far as we are aware of, CTS is the first uplift algorithm can handle multiple treatments and continuous responses. Experiments on both synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods. When growing a tree, CTS conducts exhaustive searches to find the best splitting points. One drawback with this mechanism is its susceptibility to outliers. Splits are likely to be placed adjacent to extreme values, and successive splits tend to group together similar extreme values, which introduces bias into the estimation of leaf node responses. To solve the problem, we propose a modified version of CTS algorithm called Unbiased Contextual Treatment Selection (UCTS). The key difference is the separation between the partition of feature space and the estimation of leaf responses. Before growing a tree, UCTS randomly splits the training data into two subsets, one for selecting tree splits and the other for estimating treatment-wise expected response in the leaf nodes. Experiments show that UCTS is less affected by outliers and achieves comparable performance with CTS, if not better. The two-sample approach of UCTS also leads to more tractable theoretical analysis. We prove that, under mild regularity conditions, UCTS can achieve mean-square consistency by properly tuning the leaf size. To the extent of our knowledge, UCTS is the first uplift algorithm with provable consistency. The analysis provides helpful insights on the trade-off between approximation error and estimator error and on potential ways to further improve uplift algorithms.en_US
dc.description.statementofresponsibilityby Yan Zhao.en_US
dc.format.extent74 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleUplift modeling with multiple treatmentsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc1023628313en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record