Show simple item record

dc.contributor.authorBerinde, Radu
dc.contributor.authorIndyk, Piotr
dc.contributor.authorCormode, Graham
dc.contributor.authorStrauss, Martin J.
dc.date.accessioned2012-09-17T18:08:06Z
dc.date.available2012-09-17T18:08:06Z
dc.date.issued2009-06
dc.identifier.isbn978-1-60558-553-6
dc.identifier.urihttp://hdl.handle.net/1721.1/73015
dc.description.abstractThe problem of finding heavy hitters and approximating the frequencies of items is at the heart of many problems in data stream analysis. It has been observed that several proposed solutions to this problem can outperform their worst-case guarantees on real data. This leads to the question of whether some stronger bounds can be guaranteed. We answer this in the positive by showing that a class of "counter-based algorithms" (including the popular and very space-efficient FREQUENT and SPACESAVING algorithms) provide much stronger approximation guarantees than previously known. Specifically, we show that errors in the approximation of individual elements do not depend on the frequencies of the most frequent elements, but only on the frequency of the remaining "tail." This shows that counter-based methods are the most space-efficient (in fact, space-optimal) algorithms having this strong error bound. This tail guarantee allows these algorithms to solve the "sparse recovery" problem. Here, the goal is to recover a faithful representation of the vector of frequencies, f. We prove that using space O(k), the algorithms construct an approximation f* to the frequency vector f so that the L1 error ||f -- f*||[subscript 1] is close to the best possible error min[subscript f2] ||f2 -- f||[subscript 1], where f2 ranges over all vectors with at most k non-zero entries. This improves the previously best known space bound of about O(k log n) for streams without element deletions (where n is the size of the domain from which stream elements are drawn). Other consequences of the tail guarantees are results for skewed (Zipfian) data, and guarantees for accuracy of merging multiple summarized streams.en_US
dc.description.sponsorshipDavid & Lucile Packard Foundation (Fellowship)en_US
dc.description.sponsorshipCenter for Massive Data Algorithmics (MADALGO)en_US
dc.description.sponsorshipNational Science Foundation (U.S.). (Grant number CCF-0728645)en_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/1559795.1559819en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceMIT web domainen_US
dc.titleSpace-optimal Heavy Hitters with Strong Error Boundsen_US
dc.typeArticleen_US
dc.identifier.citationRadu Berinde, Graham Cormode, Piotr Indyk, and Martin J. Strauss. 2009. Space-optimal heavy hitters with strong error bounds. In Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '09). ACM, New York, NY, USA, 157-166.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.approverIndyk, Piotr
dc.contributor.mitauthorBerinde, Radu
dc.contributor.mitauthorIndyk, Piotr
dc.relation.journalProceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '09)en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
dspace.orderedauthorsBerinde, Radu; Cormode, Graham; Indyk, Piotr; Strauss, Martin J.en
dc.identifier.orcidhttps://orcid.org/0000-0002-7983-9524
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record