Show simple item record

dc.contributor.authorKipf, Andreas
dc.contributor.authorChromejko, Damian
dc.contributor.authorHall, Alexander
dc.contributor.authorBoncz, Peter
dc.contributor.authorAndersen, David G.
dc.date.accessioned2021-10-28T13:55:08Z
dc.date.available2021-10-28T13:55:08Z
dc.date.issued2020-09
dc.identifier.issn2150-8097
dc.identifier.urihttps://hdl.handle.net/1721.1/136700
dc.description.abstract<jats:p>In modern data warehousing, data skipping is essential for high query performance. While index structures such as B-trees or hash tables allow for precise pruning, their large storage requirements make them impractical for indexing secondary columns. Therefore, many systems rely on approximate indexes such as min/max sketches (ZoneMaps) or Bloom filters for cost-effective data pruning. For example, Google PowerDrill skips more than 90% of data on average using such indexes.</jats:p> <jats:p>In this paper, we introduce Cuckoo Index (CI), an approximate secondary index structure that represents the many-to-many relationship between keys and data partitions in a highly space-efficient way. At its core, CI associates variable-sized fingerprints in a Cuckoo filter with compressed bitmaps indicating qualifying partitions. With our approach, we target equality predicates in a read-only (immutable) setting and optimize for space efficiency under the premise of practical build and lookup performance.</jats:p> <jats:p>In contrast to per-partition (Bloom) filters, CI produces correct results for lookups with keys that occur in the data. CI allows to control the ratio of false positive partitions for lookups with non-occurring keys. Our experiments with real-world and synthetic data show that CI consumes significantly less space than per-partition filters for the same pruning power for low-to-medium cardinality columns. For high cardinality columns, CI is on par with its baselines.</jats:p>en_US
dc.publisherVLDB Endowmenten_US
dc.relation.isversionof10.14778/3424573.3424577en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceVLDB Endowmenten_US
dc.titleCuckoo indexen_US
dc.title.alternativea lightweight secondary index structureen_US
dc.typeArticleen_US
dc.identifier.citationKipf, Andreas, Chromejko, Damian, Hall, Alexander, Boncz, Peter and Andersen, David G. 2020. "Cuckoo index." 13 (13).
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.date.submission2021-09-24T13:31:58Z
mit.journal.volume13en_US
mit.journal.issue13en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record