Towards Creating Synthetic Data Testbeds for Research

Oufattole, Nassim

dc.contributor.advisor	Veeramachaneni, Kalyan
dc.contributor.author	Oufattole, Nassim
dc.date.accessioned	2023-07-31T19:36:02Z
dc.date.available	2023-07-31T19:36:02Z
dc.date.issued	2023-06
dc.date.submitted	2023-07-13T14:26:14.333Z
dc.identifier.uri	https://hdl.handle.net/1721.1/151389
dc.description.abstract	Insurance datasets are generally private in order to protect user information, making it difficult for the ML research community to access and experiment with this data. To increase accessibility and innovation on private insurance data, we compile and share publicly available insurance datasets, analyze challenges inherent in these datasets, and propose, motivate, and evaluate a Synthetic Data sharing framework called Synthetic Insurance Data (SID) Testbed that can be used to improve ML performance on tabular datasets by allowing collaborators to generate Synthetic Data for Data Augmentation. In addition to this framework, we recognize that tabular data augmentation is not a well understood phenomenon, and we run controlled experiments to better understand how and when data augmentation improves machine learning performance in the setting of tabular data.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Towards Creating Synthetic Data Testbeds for Research
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: oufattole-nassim-sm-eecs-2023- ...
Size:: 5.183Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record