SDV : an open source library for synthetic data generation
Author(s)
Montanez, Andrew,M. Eng.Massachusetts Institute of Technology.
Download1098174866-MIT.pdf (12.92Mb)
Alternative title
Synthetic Data Vault
Open source library for synthetic data generation
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Kalyan Veeramachaneni.
Terms of use
Metadata
Show full item recordAbstract
In this thesis, I designed three open source Python libraries with the intention of creating a robust system that can accurately generate synthetic data. The goals of this thesis were to separate the different components in synthetic data generation into their own libraries. We identified these components as consisting of a way to transform the data, a way to model the data, and a way to recursively traverse the data set to model the relationships between the table as well as the data set itself. Once the libraries were implemented and functioning, we designed a program to run the synthetic data generation process in parallel on subsets of the original data. The goal of this program was to see if the overall modeling time could be reduced by modeling subsets in parallel and then averaging the parameters. In the end, we test how close these averaged parameters are to the original to see if this is a valid modeling technique.
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (page 105).
Date issued
2018Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.