## Information-theoretic metrics for security and privacy

##### Author(s)

Calmon, Flavio du Pin
DownloadFull printable version (11.15Mb)

##### Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

##### Terms of use

##### Metadata

Show full item record##### Abstract

In this thesis, we study problems in cryptography, privacy and estimation through the information-theoretic lens. We introduce information-theoretic metrics and associated results that shed light on the fundamental limits of what can be learned from noisy data. These metrics and results, in turn, are used to evaluate and design both symmetric-key encryption schemes and privacy-assuring mappings with provable information-theoretic security guarantees. We start by studying information-theoretic properties of symmetric-key encryption in the "small key" regime (i.e. when the key rate is smaller than the entropy rate of the message source). It is well known that security against computationally unbounded adversaries in such settings can only be achieved when the communicating parties share a key that is at least as long as the secret message (i.e. plaintext) being communicated, which is infeasible in practice. Nevertheless, even with short keys, we show that a certain level of security can be guaranteed, albeit not perfect secrecy. In order to quantify exactly how much security can be provided with short keys, we propose a new security metric, called symbol secrecy, that measures how much an adversary that observes only the encrypted message learns about individual symbols of the plaintext. Unlike most traditional rate-based information-theoretic metrics for security, symbol secrecy is non-asymptotic. Furthermore, we demonstrate how fundamental symbol secrecy performance bounds can be achieved through standard code constructions (e.g. Reed-Solomon codes). While much of information-theoretic security has considered the hiding of the plaintext, cryptographic metrics of security seek to hide functions thereof. Consequently, we extend the definition of symbol secrecy to quantify the information leaked about certain classes of functions of the plaintext. This analysis leads to a more general question: can security claims based on information metrics be translated into guarantees on what an adversary can reliably infer from the output of a security system? On the one hand, information metrics usually quantify how far the probability distribution between the secret and the disclosed information is from the ideal case where independence is achieved. On the other hand, estimation guarantees seek to assure that an adversary cannot significantly improve his estimate of the secret given the information disclosed by the system. We answer this question in the positive, and present formulations based on rate-distortion theory that allow security bounds given in terms of information metrics to be transformed into bounds on how well an adversary can estimate functions of secret variable. We do this by solving a convex program that minimizes the average estimation error over all possible distributions that satisfy the bound on the information metric. Using this approach, we are able to derive a set of general sharp bounds on how well certain classes of functions of a hidden variable can(not) be estimated from a noisy observation in terms of different information metrics. These bounds provide converse (negative) results: If an information metric is small, then any non-trivial function of the hidden variable cannot be estimated with probability of error or mean-squared error smaller than a certain threshold. The main tool used to derive the converse bounds is a set of statistics known as the Principal Inertia Components (PICs). The PICs provide a fine-grained decomposition of the dependence between two random variables. Since there are well-studied statistical methods for estimating the PICs, we can then determine the (im)possibility of estimating large classes of functions by using the bounds derived in this thesis and standard statistical tests. The PICs are of independent interest, and are applicable to problems in information theory, statistics, learning theory, and beyond. In the security and privacy setting, the PICs fulfill the dual goal of providing (i) a measure of (in)dependence between the secret and disclosed information of a security system, and (ii) a complete characterization of the functions of the secret information that can or cannot be reliably inferred given the disclosed information. We study the information-theoretic properties of the PICs, and show how they characterize the fundamental limits of perfect privacy. The results presented in this thesis are applicable to estimation, security and privacy. For estimation and statistical learning theory, they shed light on the fundamental limits of learning from noisy data, and can help guide the design of practical learning algorithms. Furthermore, as illustrated in this thesis, the proposed converse bounds are particularly useful for creating security and privacy metrics, and characterize the inherent trade-off between privacy and utility in statistical data disclosure problems. The study of security systems through the information-theoretic lens adds a new dimension for understanding and quantifying security against very powerful adversaries. Furthermore, the framework and metrics discussed here provide practical insight on how to design and improve security systems using well-known coding and optimization techniques. We conclude the thesis by presenting several promising future research directions.

##### Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. Cataloged from PDF version of thesis. Includes bibliographical references (pages 143-150).

##### Date issued

2015##### Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.##### Publisher

Massachusetts Institute of Technology

##### Keywords

Electrical Engineering and Computer Science.