Artificial intelligence-assisted data analysis with BayesDB
Author(s)
Curlette, Christina M
DownloadFull printable version (1.982Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Vikash K. Mansinghka.
Terms of use
Metadata
Show full item recordAbstract
When applying machine learning and statistics techniques to real-world datasets, problems often arise due to missing data or errors from black-box predictive models that are difficult to understand or explain in terms of the model's inputs. This thesis explores the applicability of BayesDB, a probabilistic programming platform for data analysis, to three common problems in data analysis: (i) modeling patterns of missing data, (ii) imputing missing values in datasets, and (iii) characterizing the error behavior of predictive models. Experiments show that CrossCat, the default model discovery mechanism used by BayesDB, can address all three problems effectively. Examples are drawn from the American National Election Studies and the Gapminder database of global macroeconomic and public health indicators.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 67-68).
Date issued
2017Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.