Artificial intelligence-assisted data analysis with BayesDB
Author(s)Curlette, Christina M
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Vikash K. Mansinghka.
MetadataShow full item record
When applying machine learning and statistics techniques to real-world datasets, problems often arise due to missing data or errors from black-box predictive models that are difficult to understand or explain in terms of the model's inputs. This thesis explores the applicability of BayesDB, a probabilistic programming platform for data analysis, to three common problems in data analysis: (i) modeling patterns of missing data, (ii) imputing missing values in datasets, and (iii) characterizing the error behavior of predictive models. Experiments show that CrossCat, the default model discovery mechanism used by BayesDB, can address all three problems effectively. Examples are drawn from the American National Election Studies and the Gapminder database of global macroeconomic and public health indicators.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 67-68).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.