Differentially Private Synthetic Data Generation for Relational Databases
Author(s)
Alimohammadi, Kaveh
DownloadThesis PDF (1.203Mb)
Advisor
Azizan, Navid
Terms of use
Metadata
Show full item recordAbstract
Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. This study presents the first-of-its-kind algorithm that can be combined with \emph{any} existing DP mechanisms to generate synthetic relational databases. The algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors in terms of low-order marginal distributions while maintaining referential integrity; consequently eliminates the need to flatten a relational database into a master table (saving space), operates efficiently (saving time), and scales effectively to high-dimensional data. We provide both DP and theoretical utility guarantees for our algorithm. Through numerical experiments on real-world datasets, we demonstrate the effectiveness of our method in preserving fidelity to the original data.
Date issued
2025-05Department
Massachusetts Institute of Technology. Institute for Data, Systems, and SocietyPublisher
Massachusetts Institute of Technology