Tsunami: a learned multi-dimensional index for correlated data and skewed workloads

Ding, Jialin; Nathan, Vikram; Alizadeh, Mohammad; Kraska, Tim

Author(s)

Ding, Jialin; Nathan, Vikram; Alizadeh, Mohammad; Kraska, Tim

DownloadPublished version (1.913Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

© 2020, VLDB Endowment. All rights reserved. Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6× faster query performance and up to 8× smaller index size than existing learned multi-dimensional indexes, in addition to up to 11× faster query performance and 170× smaller index size than optimally-tuned traditional indexes.

Date issued

2020

URI

https://hdl.handle.net/1721.1/132295

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of the VLDB Endowment

Publisher

VLDB Endowment

Collections

MIT Open Access Articles