Data science and advanced analytics : an integrated framework for creating value from data

Paredes, Miguel (Miguel Andres)

dc.contributor.advisor	Una-May O'Reilly and Roy Welsch.	en_US
dc.contributor.author	Paredes, Miguel (Miguel Andres)	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Urban Studies and Planning.	en_US
dc.date.accessioned	2019-02-05T15:59:44Z
dc.date.available	2019-02-05T15:59:44Z
dc.date.copyright	2018	en_US
dc.date.issued	2018	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/120232
dc.description	Thesis: Ph. D. in Data Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2018.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 78-83).	en_US
dc.description.abstract	Fundamental problems in society, such as medical decision support, urban planning and customer management, can be addressed by data-driven modeling. Frequently, the only data available are observational rather than experimental. This precludes causal inference, though it supports quasi-causal inference (or causal approximation) and prediction. With three different studies that are driven by observational data, this thesis compares machine learning and econometric modeling in terms of their purposes, insights, and uses. It proposes a data science methodology that combines both types of modeling to enable experimental designs which would otherwise be impossible to carry out. In the first two studies, we address problems through both a prediction and quasi-causation approach (i.e. machine learning and econometrics), exploring their similarities, differences, benefits, and limitations. These two initial studies serve to demonstrate the need for an end-to-end methodology that combines prediction and causation. Our proposed data science methodology is presented in the third study, in which an enterprise seeks to address its customer churn. First, it uses observational data and econometrics to approximate the causal determinants of churn (quasi-causal insights). Second, it uses machine learning to predict churn likelihoods of clients, and selects a study group with likelihoods above a threshold of interest. Third, the quasi-causal insights are used to design a stratified randomized controlled trial (i.e. A/B test) where study subjects are randomly assigned to one of three experimental groups. Finally, thanks to the rigorously designed experiment, the causal effects of the interventions are determined, and the cost-effectiveness of the treatments relative to the control group are established.	en_US
dc.description.statementofresponsibility	by Miguel Paredes.	en_US
dc.format.extent	83 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Urban Studies and Planning.	en_US
dc.title	Data science and advanced analytics : an integrated framework for creating value from data	en_US
dc.title.alternative	Integrated framework for creating value from data	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D. in Data Science	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Urban Studies and Planning
dc.identifier.oclc	1083122584	en_US

Files in this item

Name:: 1083122584-MIT.pdf
Size:: 7.104Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record