Show simple item record

dc.contributor.authorAndoni, Alexandr
dc.contributor.authorIndyk, Piotr
dc.contributor.authorOnak, Krzysztof
dc.contributor.authorRubinfeld, Ronitt
dc.date.accessioned2012-10-11T18:22:51Z
dc.date.available2012-10-11T18:22:51Z
dc.date.issued2009-07
dc.date.submitted2009-07
dc.identifier.isbn978-3-642-02926-4
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttp://hdl.handle.net/1721.1/73886
dc.description36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part Ien_US
dc.description.abstractWe initiate the study of sublinear-time algorithms in the external memory model [1]. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in parallel, many problems have external memory algorithms whose number of block accesses is only a small fraction (e.g. 1/B) of their main memory complexity. However, to the best of our knowledge, no such reduction in complexity is known for any sublinear-time algorithm. One plausible explanation is that the vast majority of sublinear-time algorithms use random sampling and thus exhibit no locality of reference. This state of affairs is quite unfortunate, since both sublinear-time algorithms and the external memory model are important approaches to dealing with massive data sets, and ideally they should be combined to achieve best performance. In this paper we show that such combination is indeed possible. In particular, we consider three well-studied problems: testing of distinctness, uniformity and identity of an empirical distribution induced by data. For these problems we show random-sampling-based algorithms whose number of block accesses is up to a factor of 1/√B smaller than the main memory complexity of those problems. We also show that this improvement is optimal for those problems. Since these problems are natural primitives for a number of sampling-based algorithms for other problems, our tools improve the external memory complexity of other problems as well.en_US
dc.description.sponsorshipDavid & Lucile Packard Foundation (Fellowship)en_US
dc.description.sponsorshipCenter for Massive Data Algorithmics (MADALGO)en_US
dc.description.sponsorshipMarie Curie (International Reintegration Grant 231077)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant 0514771)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant 0728645)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant 0732334)en_US
dc.description.sponsorshipSymantec Research Labs (Research Fellowship)en_US
dc.language.isoen_US
dc.publisherSpringer Berlin / Heidelbergen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-642-02927-1_9en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceMIT web domainen_US
dc.titleExternal Samplingen_US
dc.typeArticleen_US
dc.identifier.citationAndoni, Alexandr et al. “External Sampling.” Automata, Languages and Programming. Ed. Susanne Albers et al. LNCS Vol. 5555. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. 83–94.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorAndoni, Alexandr
dc.contributor.mitauthorIndyk, Piotr
dc.contributor.mitauthorOnak, Krzysztof
dc.contributor.mitauthorRubinfeld, Ronitt
dc.relation.journalAutomata, Languages and Programmingen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsAndoni, Alexandr; Indyk, Piotr; Onak, Krzysztof; Rubinfeld, Ronitten
dc.identifier.orcidhttps://orcid.org/0000-0002-4353-7639
dc.identifier.orcidhttps://orcid.org/0000-0002-7983-9524
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record