SDV : an open source library for synthetic data generation
Author(s)Montanez, Andrew,M. Eng.Massachusetts Institute of Technology.
Synthetic Data Vault
Open source library for synthetic data generation
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
In this thesis, I designed three open source Python libraries with the intention of creating a robust system that can accurately generate synthetic data. The goals of this thesis were to separate the different components in synthetic data generation into their own libraries. We identified these components as consisting of a way to transform the data, a way to model the data, and a way to recursively traverse the data set to model the relationships between the table as well as the data set itself. Once the libraries were implemented and functioning, we designed a program to run the synthetic data generation process in parallel on subsets of the original data. The goal of this program was to see if the overall modeling time could be reduced by modeling subsets in parallel and then averaging the parameters. In the end, we test how close these averaged parameters are to the original to see if this is a valid modeling technique.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (page 105).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.