Parallel Batch-Dynamic 𝑘d-trees
Author(s)
Yesantharao, Rahul
DownloadThesis PDF (1.001Mb)
Advisor
Shun, Julian
Terms of use
Metadata
Show full item recordAbstract
𝑘d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to 𝑘d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a 𝑘d-tree that allows for efficient parallel 𝑘-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of 𝑘d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with 𝑛 points, each batch of 𝐵 updates takes 𝑂(𝐵 log2 (𝑛 + 𝐵)) amortized work and 𝑂(log (𝑛 + 𝐵) log log (𝑛 + 𝐵)) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious 𝑘d-tree construction and parallel bloom filter construction.
Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8× (28.4× on average) for batch insertions, up to 35.5× (27.2× on average) for batch deletions, and up to 46.1× (40.0× on average) for 𝑘-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for 𝑘-NN queries. We compare to two baseline 𝑘d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates.
Date issued
2022-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology