Improving the adaptability of differential privacy

Mugunthan, Vaikkunth.

Author(s)

Mugunthan, Vaikkunth.

Download1124926002-MIT.pdf (2.473Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Lalana Kagal.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Differential privacy is a mathematical technique that provides strong theoretical privacy guarantees by ensuring statistical indistinguishability of individuals in a dataset. It has become the de facto framework for providing privacy-preserving data analysis over statistical datasets. Differential privacy has garnered significant attention from researchers and privacy experts due to its strong privacy guarantees. However, the lack of flexibility due to the dearth of configurable parameters in existing mechanisms, the accuracy loss caused by the noise added, and problems with choosing a suitable value of the privacy parameter, E, have prevented its widespread adoption in the industry. In this thesis, I address these issues. In differential privacy, the standard approach is to add Laplacian noise to the output of queries. I propose new probability distributions and noise adding mechanisms that preserve ([epsilon])-differential privacy and ([epsilon], [delta])-differential privacy.

The distributions can be observed as an asymmetric Laplacian distribution and a generalized truncated Laplacian distribution. I show that the proposed mechanisms add optimal noise in a global context, conditional upon technical lemmas. In addition, I also show that the proposed mechanisms have greater adaptability than the Laplacian mechanism as there is more than one parameter to adjust. I then demonstrate that the generalized truncated Laplacian mechanism performs better than the optimal Gaussian mechanism. The presented mechanisms are highly useful as they enable data controllers to fine-tune the perturbation necessary to protect privacy to use case specific distortion requirements. The second issue addressed in this thesis is to identify an optimal value of E and specify bounds on it. E is used to quantify the privacy risk posed by revealing statistics calculated on private and sensitive data.

Though it has an intuitive theoretical explanation, choosing an appropriate value is non-trivial. I present a systematic and methodical way to calculate e once the necessary constraints are given. In order to derive context-specific optimal values and an upper bound on E, I use the confidence probability approach, Chebyshev's inequality, and McDiarmid's inequality.

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 55-56).

Date issued

2019

URI

https://hdl.handle.net/1721.1/122763

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses