Show simple item record

dc.contributor.advisorUna-May O'Reilly and Roy Welsch.en_US
dc.contributor.authorParedes, Miguel (Miguel Andres)en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Urban Studies and Planning.en_US
dc.date.accessioned2019-02-05T15:59:44Z
dc.date.available2019-02-05T15:59:44Z
dc.date.copyright2018en_US
dc.date.issued2018en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/120232
dc.descriptionThesis: Ph. D. in Data Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2018.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 78-83).en_US
dc.description.abstractFundamental problems in society, such as medical decision support, urban planning and customer management, can be addressed by data-driven modeling. Frequently, the only data available are observational rather than experimental. This precludes causal inference, though it supports quasi-causal inference (or causal approximation) and prediction. With three different studies that are driven by observational data, this thesis compares machine learning and econometric modeling in terms of their purposes, insights, and uses. It proposes a data science methodology that combines both types of modeling to enable experimental designs which would otherwise be impossible to carry out. In the first two studies, we address problems through both a prediction and quasi-causation approach (i.e. machine learning and econometrics), exploring their similarities, differences, benefits, and limitations. These two initial studies serve to demonstrate the need for an end-to-end methodology that combines prediction and causation. Our proposed data science methodology is presented in the third study, in which an enterprise seeks to address its customer churn. First, it uses observational data and econometrics to approximate the causal determinants of churn (quasi-causal insights). Second, it uses machine learning to predict churn likelihoods of clients, and selects a study group with likelihoods above a threshold of interest. Third, the quasi-causal insights are used to design a stratified randomized controlled trial (i.e. A/B test) where study subjects are randomly assigned to one of three experimental groups. Finally, thanks to the rigorously designed experiment, the causal effects of the interventions are determined, and the cost-effectiveness of the treatments relative to the control group are established.en_US
dc.description.statementofresponsibilityby Miguel Paredes.en_US
dc.format.extent83 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectUrban Studies and Planning.en_US
dc.titleData science and advanced analytics : an integrated framework for creating value from dataen_US
dc.title.alternativeIntegrated framework for creating value from dataen_US
dc.typeThesisen_US
dc.description.degreePh. D. in Data Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Urban Studies and Planning
dc.identifier.oclc1083122584en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record