Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

Goldfeld, Ziv; Greenewald, Kristjan; Niles-Weed, Jonathan; Polyanskiy, Yury

dc.contributor.author	Goldfeld, Ziv
dc.contributor.author	Greenewald, Kristjan
dc.contributor.author	Niles-Weed, Jonathan
dc.contributor.author	Polyanskiy, Yury
dc.date.accessioned	2021-10-27T20:30:15Z
dc.date.available	2021-10-27T20:30:15Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/1721.1/135991
dc.description.abstract	© 1963-2012 IEEE. This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating P* Nσ, for Nσ ≜ N (0,σ 2 I d), by hat P n∗ Nσ under different statistical distances, where hat P n is the empirical measure. We examine the convergence in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and χ 2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W 1) converges at the rate e O(d) n-1/2 in remarkable contrast to a (typical) n-frac 1 d rate for unsmoothed W 1 (and d ≥ 3). Similarly, for the KL divergence, squared 2-Wasserstein distance (W 22), and χ 2-divergence, the convergence rate is e O(d)} n-1, but only if P achieves finite input-output χ 2 mutual information across the additive white Gaussian noise (AWGN) channel. If the latter condition is not met, the rate changes to ω (n-1) for the KL divergence and W 22, while the χ 2-divergence becomes infinite-a curious dichotomy. As an application we consider estimating the differential entropy h(S+Z), where S∼ P and Z∼ Nσ are independent d-dimensional random variables. The distribution P is unknown and belongs to some nonparametric class, but n independently and identically distributed (i.i.d) samples from it are available. Despite the regularizing effect of noise, we first show that any good estimator (within an additive gap) for this problem must have a sample complexity that is exponential in d. We then leverage the above empirical approximation results to show that the absolute-error risk of the plug-in estimator converges as e O(d)} n-1/2, thus attaining the parametric rate in n. This establishes the plug-in estimator as minimax rate-optimal for the considered problem, with sharp dependence of the convergence rate both in n and d. We provide numerical results comparing the performance of the plug-in estimator to that of general-purpose (unstructured) differential entropy estimators (based on kernel density estimation (KDE) or k nearest neighbors (kNN) techniques) applied to samples of S+Z. These results reveal a significant empirical superiority of the plug-in to state-of-the-art KDE and kNN methods. As a motivating utilization of the plug-in approach, we estimate information flows in deep neural networks and discuss Tishby's Information Bottleneck and the compression conjecture, among others.
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.isversionof	10.1109/TIT.2020.2975480
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source	arXiv
dc.title	Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation
dc.type	Article
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	MIT-IBM Watson AI Lab
dc.relation.journal	IEEE Transactions on Information Theory
dc.eprint.version	Original manuscript
dc.type.uri	http://purl.org/eprint/type/JournalArticle
eprint.status	http://purl.org/eprint/status/NonPeerReviewed
dc.date.updated	2021-03-09T20:09:08Z
dspace.orderedauthors	Goldfeld, Z; Greenewald, K; Niles-Weed, J; Polyanskiy, Y
dspace.date.submission	2021-03-09T20:09:09Z
mit.journal.volume	66
mit.journal.issue	7
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: 1905.13576.pdf
Size:: 1.267Mb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record