Learning Sums of Independent Integer Random Variables

Daskalakis, Constantinos; Diakonikolas, Ilias; ODonnell, Ryan; Servedio, Rocco A.; Tan, Li-Yang

Author(s)

Diakonikolas, Ilias; O'Donnell, Ryan; Servedio, Rocco A.; Tan, Li-Yang; Daskalakis, Konstantinos

DownloadDaskalakis_Learning sums.pdf (344.4Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

Let S = X[subscript 1]+···+X[subscript n] be a sum of n independent integer random variables X[subscript i], where each X[subscript i] is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi's need not be identically distributed). How many samples are required to learn the distribution S to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such S to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of [DDS12b] which gave a similar learning result for the special case k = 2 (when the distribution S is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k = 3. A key difficulty is that, in contrast to the case of k = 2, sums of independent {0, 1, 2}-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be Θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent {0, 1, ... , k-1}-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the “shift invariance” of the random variables Xi in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent {0, 1, ... - k-1}-valued random variables, is of independent interest.

Date issued

2013-10

URI

http://hdl.handle.net/1721.1/99970

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Citation

Daskalakis, Constantinos, Ilias Diakonikolas, Ryan ODonnell, Rocco A. Servedio, and Li-Yang Tan. “Learning Sums of Independent Integer Random Variables.” 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (October 2013).

Version: Author's final manuscript

ISBN

978-0-7695-5135-7

ISSN

0272-5428

Collections

MIT Open Access Articles

DSpace@MIT