Data science and advanced analytics : an integrated framework for creating value from data

Paredes, Miguel (Miguel Andres)

Author(s)

Paredes, Miguel (Miguel Andres)

DownloadFull printable version (7.104Mb)

Alternative title

Integrated framework for creating value from data

Other Contributors

Massachusetts Institute of Technology. Department of Urban Studies and Planning.

Advisor

Una-May O'Reilly and Roy Welsch.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Fundamental problems in society, such as medical decision support, urban planning and customer management, can be addressed by data-driven modeling. Frequently, the only data available are observational rather than experimental. This precludes causal inference, though it supports quasi-causal inference (or causal approximation) and prediction. With three different studies that are driven by observational data, this thesis compares machine learning and econometric modeling in terms of their purposes, insights, and uses. It proposes a data science methodology that combines both types of modeling to enable experimental designs which would otherwise be impossible to carry out. In the first two studies, we address problems through both a prediction and quasi-causation approach (i.e. machine learning and econometrics), exploring their similarities, differences, benefits, and limitations. These two initial studies serve to demonstrate the need for an end-to-end methodology that combines prediction and causation. Our proposed data science methodology is presented in the third study, in which an enterprise seeks to address its customer churn. First, it uses observational data and econometrics to approximate the causal determinants of churn (quasi-causal insights). Second, it uses machine learning to predict churn likelihoods of clients, and selects a study group with likelihoods above a threshold of interest. Third, the quasi-causal insights are used to design a stratified randomized controlled trial (i.e. A/B test) where study subjects are randomly assigned to one of three experimental groups. Finally, thanks to the rigorously designed experiment, the causal effects of the interventions are determined, and the cost-effectiveness of the treatments relative to the control group are established.

Description

Thesis: Ph. D. in Data Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2018.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 78-83).

Date issued

2018

URI

http://hdl.handle.net/1721.1/120232

Department

Massachusetts Institute of Technology. Department of Urban Studies and Planning

Publisher

Massachusetts Institute of Technology

Keywords

Urban Studies and Planning.

Collections

Doctoral Theses