Show simple item record

dc.contributor.authorGanev, Georgi
dc.contributor.authorXu, Kai
dc.contributor.authorDe Cristofaro, Emiliano
dc.date.accessioned2025-01-28T14:49:18Z
dc.date.available2025-01-28T14:49:18Z
dc.date.issued2024-12-02
dc.identifier.isbn979-8-4007-0636-3
dc.identifier.urihttps://hdl.handle.net/1721.1/158085
dc.descriptionCCS ’24, October 14–18, 2024, Salt Lake City, UT, USA.en_US
dc.description.abstractGenerative models trained with Differential Privacy (DP) can produce synthetic data while reducing privacy risks. However, navigating their privacy-utility tradeoffs makes finding the best models for specific settings/tasks challenging. This paper bridges this gap by profiling how DP generative models for tabular data distribute privacy budgets across rows and columns, which is one of the primary sources of utility degradation. We compare graphical and deep generative models, focusing on the key factors contributing to how privacy budgets are spent, i.e., underlying modeling techniques, DP mechanisms, and data dimensionality. Through our measurement study, we shed light on the characteristics that make different models suitable for various settings and tasks. For instance, we find that graphical models distribute privacy budgets horizontally and thus cannot handle relatively wide datasets for a fixed training time; also, the performance on the task they were optimized for monotonically increases with more data but could also overfit. Deep generative models spend their budgets per iteration, so their behavior is less predictable with varying dataset dimensions, but are more flexible as they could perform better if trained on more features. Moreover, low levels of privacy (ε≥100) could help some models generalize, achieving better results than without applying DP. We believe our work will aid the deployment of DP synthetic data techniques by navigating through the best candidate models vis-à-vis the dataset features, desired privacy levels, and downstream tasks.en_US
dc.publisherACM|Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Securityen_US
dc.relation.isversionofhttps://doi.org/10.1145/3658644.3690215en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleGraphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utilityen_US
dc.typeArticleen_US
dc.identifier.citationGanev, Georgi, Xu, Kai and De Cristofaro, Emiliano. 2024. "Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility."
dc.contributor.departmentMIT-IBM Watson AI Laben_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-01-01T08:49:21Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-01-01T08:49:21Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record