Sketching and streaming high-dimensional vectors
Author(s)
Nelson, Jelani (Jelani Osei)
DownloadFull printable version (945.2Kb)
Alternative title
Sketching and streaming algorithms
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Erik D. Demaine and Piotr Indyk.
Terms of use
Metadata
Show full item recordAbstract
A sketch of a dataset is a small-space data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sketch itself be computable by a small-space algorithm given just one pass over the data, a so-called streaming algorithm. Sketching and streaming have found numerous applications in network traffic monitoring, data mining, trend detection, sensor networks, and databases. In this thesis, I describe several new contributions in the area of sketching and streaming algorithms. The first space-optimal streaming algorithm for the distinct elements problem. Our algorithm also achieves 0(1) update and reporting times. A streaming algorithm for Hamming norm estimation in the turnstile model which achieves the best known space complexity. The first space-optimal algorithm for pth moment estimation in turnstile streams for 0 < p < 2, with matching lower bounds, and another space-optimal algorithm which also has a fast O(log²(1/[epsilon]) log log(1[epsilon])) update time for (1+/-[epsilon])- approximation. A general reduction from empirical entropy estimation in turnstile streams to moment estimation, providing the only known near-optimal space-complexity upper bound for this problem. A proof of the Johnson-Lindenstrauss lemma where every matrix in the support of the embedding distribution is much sparser than previous known constructions. In particular, to achieve distortion (1+/-[epsilon]) with probability 1-[delta], we embed into optimal dimension 0([epsilon]-²log(1/[delta])) and such that every matrix in the support of the distribution has 0([epsilon]-¹ log(1/[delta])) non-zero entries per column.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student submitted PDF version of thesis. Includes bibliographical references (p. 136-145).
Date issued
2011Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.