dc.contributor.author | Kipf, Andreas | |
dc.contributor.author | Chromejko, Damian | |
dc.contributor.author | Hall, Alexander | |
dc.contributor.author | Boncz, Peter | |
dc.contributor.author | Andersen, David G. | |
dc.date.accessioned | 2021-10-28T13:55:08Z | |
dc.date.available | 2021-10-28T13:55:08Z | |
dc.date.issued | 2020-09 | |
dc.identifier.issn | 2150-8097 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/136700 | |
dc.description.abstract | <jats:p>In modern data warehousing, data skipping is essential for high query performance. While index structures such as B-trees or hash tables allow for precise pruning, their large storage requirements make them impractical for indexing secondary columns. Therefore, many systems rely on approximate indexes such as min/max sketches (ZoneMaps) or Bloom filters for cost-effective data pruning. For example, Google PowerDrill skips more than 90% of data on average using such indexes.</jats:p>
<jats:p>In this paper, we introduce Cuckoo Index (CI), an approximate secondary index structure that represents the many-to-many relationship between keys and data partitions in a highly space-efficient way. At its core, CI associates variable-sized fingerprints in a Cuckoo filter with compressed bitmaps indicating qualifying partitions. With our approach, we target equality predicates in a read-only (immutable) setting and optimize for space efficiency under the premise of practical build and lookup performance.</jats:p>
<jats:p>In contrast to per-partition (Bloom) filters, CI produces correct results for lookups with keys that occur in the data. CI allows to control the ratio of false positive partitions for lookups with non-occurring keys. Our experiments with real-world and synthetic data show that CI consumes significantly less space than per-partition filters for the same pruning power for low-to-medium cardinality columns. For high cardinality columns, CI is on par with its baselines.</jats:p> | en_US |
dc.publisher | VLDB Endowment | en_US |
dc.relation.isversionof | 10.14778/3424573.3424577 | en_US |
dc.rights | Creative Commons Attribution-NonCommercial-NoDerivs License | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | en_US |
dc.source | VLDB Endowment | en_US |
dc.title | Cuckoo index | en_US |
dc.title.alternative | a lightweight secondary index structure | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Kipf, Andreas, Chromejko, Damian, Hall, Alexander, Boncz, Peter and Andersen, David G. 2020. "Cuckoo index." 13 (13). | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dspace.date.submission | 2021-09-24T13:31:58Z | |
mit.journal.volume | 13 | en_US |
mit.journal.issue | 13 | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |