Implementation of a cross-platform automated Bayesian data modeling system
Author(s)
Charchut, Nicholas George.
Download1227274665-MIT.pdf (2.563Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Vikash K. Mansinghka.
Terms of use
Metadata
Show full item recordAbstract
Understanding the underlying structure of a high-dimensional dataset is a quintessential task in data science and multivariate statistics. The CrossCat model class provides an automated solution by using Bayesian Non-Parametric processes to identify dependencies within the data without any user input necessary. This thesis provides an implementation of the CrossCat model class in the functional programming language Clojure. This implementation, called ClojureCat, was designed to be part of a probabilistic programming platform and implemented to be able to cross-compile into JavaScript, allowing complex inference procedures to be run in any JavaScript-supporting web browser. The implementation is thoroughly tested and benchmarked with respect to existing CrossCat implementations and other baselines, showing that ClojureCat is not only performant, but accurate in its implementation of CrossCat inference procedures. Also included in ClojureCat are several implementations of few-shot learning, in which for several real-world datasets, we utilize extremely sparse label sets and CrossCat's learned structure of the data to draw meaningful conclusions, make predictions, and further analyze the high-dimensional data.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020 Cataloged from student-submitted PDF of thesis. Includes bibliographical references (page 83).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.