Improving the adaptability of differential privacy
Author(s)
Mugunthan, Vaikkunth.
Download1124926002-MIT.pdf (2.473Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Lalana Kagal.
Terms of use
Metadata
Show full item recordAbstract
Differential privacy is a mathematical technique that provides strong theoretical privacy guarantees by ensuring statistical indistinguishability of individuals in a dataset. It has become the de facto framework for providing privacy-preserving data analysis over statistical datasets. Differential privacy has garnered significant attention from researchers and privacy experts due to its strong privacy guarantees. However, the lack of flexibility due to the dearth of configurable parameters in existing mechanisms, the accuracy loss caused by the noise added, and problems with choosing a suitable value of the privacy parameter, E, have prevented its widespread adoption in the industry. In this thesis, I address these issues. In differential privacy, the standard approach is to add Laplacian noise to the output of queries. I propose new probability distributions and noise adding mechanisms that preserve ([epsilon])-differential privacy and ([epsilon], [delta])-differential privacy. The distributions can be observed as an asymmetric Laplacian distribution and a generalized truncated Laplacian distribution. I show that the proposed mechanisms add optimal noise in a global context, conditional upon technical lemmas. In addition, I also show that the proposed mechanisms have greater adaptability than the Laplacian mechanism as there is more than one parameter to adjust. I then demonstrate that the generalized truncated Laplacian mechanism performs better than the optimal Gaussian mechanism. The presented mechanisms are highly useful as they enable data controllers to fine-tune the perturbation necessary to protect privacy to use case specific distortion requirements. The second issue addressed in this thesis is to identify an optimal value of E and specify bounds on it. E is used to quantify the privacy risk posed by revealing statistics calculated on private and sensitive data. Though it has an intuitive theoretical explanation, choosing an appropriate value is non-trivial. I present a systematic and methodical way to calculate e once the necessary constraints are given. In order to derive context-specific optimal values and an upper bound on E, I use the confidence probability approach, Chebyshev's inequality, and McDiarmid's inequality.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 55-56).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.