dc.contributor.advisor | Kalyan Veeramachaneni. | en_US |
dc.contributor.author | Montanez, Andrew,M. Eng.Massachusetts Institute of Technology. | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2019-07-15T20:29:31Z | |
dc.date.available | 2019-07-15T20:29:31Z | |
dc.date.copyright | 2018 | en_US |
dc.date.issued | 2018 | en_US |
dc.identifier.uri | https://hdl.handle.net/1721.1/121631 | |
dc.description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. | en_US |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018 | en_US |
dc.description | Cataloged from student-submitted PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (page 105). | en_US |
dc.description.abstract | In this thesis, I designed three open source Python libraries with the intention of creating a robust system that can accurately generate synthetic data. The goals of this thesis were to separate the different components in synthetic data generation into their own libraries. We identified these components as consisting of a way to transform the data, a way to model the data, and a way to recursively traverse the data set to model the relationships between the table as well as the data set itself. Once the libraries were implemented and functioning, we designed a program to run the synthetic data generation process in parallel on subsets of the original data. The goal of this program was to see if the overall modeling time could be reduced by modeling subsets in parallel and then averaging the parameters. In the end, we test how close these averaged parameters are to the original to see if this is a valid modeling technique. | en_US |
dc.description.statementofresponsibility | by Andrew Montanez. | en_US |
dc.format.extent | 105 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | SDV : an open source library for synthetic data generation | en_US |
dc.title.alternative | Synthetic Data Vault | en_US |
dc.title.alternative | Open source library for synthetic data generation | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.identifier.oclc | 1098174866 | en_US |
dc.description.collection | M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science | en_US |
dspace.imported | 2019-07-15T20:29:27Z | en_US |
mit.thesis.degree | Master | en_US |
mit.thesis.department | EECS | en_US |