MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Data science and advanced analytics : an integrated framework for creating value from data

Author(s)
Paredes, Miguel (Miguel Andres)
Thumbnail
DownloadFull printable version (7.104Mb)
Alternative title
Integrated framework for creating value from data
Other Contributors
Massachusetts Institute of Technology. Department of Urban Studies and Planning.
Advisor
Una-May O'Reilly and Roy Welsch.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Fundamental problems in society, such as medical decision support, urban planning and customer management, can be addressed by data-driven modeling. Frequently, the only data available are observational rather than experimental. This precludes causal inference, though it supports quasi-causal inference (or causal approximation) and prediction. With three different studies that are driven by observational data, this thesis compares machine learning and econometric modeling in terms of their purposes, insights, and uses. It proposes a data science methodology that combines both types of modeling to enable experimental designs which would otherwise be impossible to carry out. In the first two studies, we address problems through both a prediction and quasi-causation approach (i.e. machine learning and econometrics), exploring their similarities, differences, benefits, and limitations. These two initial studies serve to demonstrate the need for an end-to-end methodology that combines prediction and causation. Our proposed data science methodology is presented in the third study, in which an enterprise seeks to address its customer churn. First, it uses observational data and econometrics to approximate the causal determinants of churn (quasi-causal insights). Second, it uses machine learning to predict churn likelihoods of clients, and selects a study group with likelihoods above a threshold of interest. Third, the quasi-causal insights are used to design a stratified randomized controlled trial (i.e. A/B test) where study subjects are randomly assigned to one of three experimental groups. Finally, thanks to the rigorously designed experiment, the causal effects of the interventions are determined, and the cost-effectiveness of the treatments relative to the control group are established.
Description
Thesis: Ph. D. in Data Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, 2018.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages 78-83).
 
Date issued
2018
URI
http://hdl.handle.net/1721.1/120232
Department
Massachusetts Institute of Technology. Department of Urban Studies and Planning
Publisher
Massachusetts Institute of Technology
Keywords
Urban Studies and Planning.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.