Learning Sums of Independent Integer Random Variables
Author(s)
Diakonikolas, Ilias; O'Donnell, Ryan; Servedio, Rocco A.; Tan, Li-Yang; Daskalakis, Konstantinos
DownloadDaskalakis_Learning sums.pdf (344.4Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Let S = X[subscript 1]+···+X[subscript n] be a sum of n independent integer random variables X[subscript i], where each X[subscript i] is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi's need not be identically distributed). How many samples are required to learn the distribution S to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such S to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of [DDS12b] which gave a similar learning result for the special case k = 2 (when the distribution S is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k = 3. A key difficulty is that, in contrast to the case of k = 2, sums of independent {0, 1, 2}-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be Θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent {0, 1, ... , k-1}-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the “shift invariance” of the random variables Xi in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent {0, 1, ... - k-1}-valued random variables, is of independent interest.
Date issued
2013-10Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Daskalakis, Constantinos, Ilias Diakonikolas, Ryan ODonnell, Rocco A. Servedio, and Li-Yang Tan. “Learning Sums of Independent Integer Random Variables.” 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (October 2013).
Version: Author's final manuscript
ISBN
978-0-7695-5135-7
ISSN
0272-5428