MIT Libraries homeMIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Cuckoo index

Author(s)
Kipf, Andreas; Chromejko, Damian; Hall, Alexander; Boncz, Peter; Andersen, David G.
Thumbnail
Download3424573.3424577.pdf (662.1Kb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Alternative title
a lightweight secondary index structure
Terms of use
Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
<jats:p>In modern data warehousing, data skipping is essential for high query performance. While index structures such as B-trees or hash tables allow for precise pruning, their large storage requirements make them impractical for indexing secondary columns. Therefore, many systems rely on approximate indexes such as min/max sketches (ZoneMaps) or Bloom filters for cost-effective data pruning. For example, Google PowerDrill skips more than 90% of data on average using such indexes.</jats:p> <jats:p>In this paper, we introduce Cuckoo Index (CI), an approximate secondary index structure that represents the many-to-many relationship between keys and data partitions in a highly space-efficient way. At its core, CI associates variable-sized fingerprints in a Cuckoo filter with compressed bitmaps indicating qualifying partitions. With our approach, we target equality predicates in a read-only (immutable) setting and optimize for space efficiency under the premise of practical build and lookup performance.</jats:p> <jats:p>In contrast to per-partition (Bloom) filters, CI produces correct results for lookups with keys that occur in the data. CI allows to control the ratio of false positive partitions for lookups with non-occurring keys. Our experiments with real-world and synthetic data show that CI consumes significantly less space than per-partition filters for the same pruning power for low-to-medium cardinality columns. For high cardinality columns, CI is on par with its baselines.</jats:p>
Date issued
2020-09
URI
https://hdl.handle.net/1721.1/136700
Publisher
VLDB Endowment
Citation
Kipf, Andreas, Chromejko, Damian, Hall, Alexander, Boncz, Peter and Andersen, David G. 2020. "Cuckoo index." 13 (13).
Version: Final published version
ISSN
2150-8097

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries homeMIT Libraries logo

Find us on

Twitter Facebook Instagram YouTube RSS

MIT Libraries navigation

SearchHours & locationsBorrow & requestResearch supportAbout us
PrivacyPermissionsAccessibility
MIT
Massachusetts Institute of Technology
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.