Approximating and testing k-histogram distributions in sub-linear time
Author(s)
Indyk, Piotr; Levi, Reut; Rubinfeld, Ronitt
DownloadIndyk_Approximating and.pdf (291.5Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the l [subscript 2] distance to the distribution p. We give time and sample efficient algorithms for this problem.
We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the l [subscript 1] distance and l [subscript 2] distance respectively.
Date issued
2012-05Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the 31st symposium on Principles of Database Systems (PODS '12)
Publisher
Association for Computing Machinery (ACM)
Citation
Piotr Indyk, Reut Levi, and Ronitt Rubinfeld. 2012. Approximating and testing k-histogram distributions in sub-linear time. In Proceedings of the 31st symposium on Principles of Database Systems (PODS '12), Markus Krötzsch (Ed.). ACM, New York, NY, USA, 15-22.
Version: Author's final manuscript
ISBN
9781450312486