| dc.contributor.author | Wang, Yiqiu | |
| dc.contributor.author | Yu, Shangdi | |
| dc.contributor.author | Gu, Yan | |
| dc.contributor.author | Shun, Julian | |
| dc.date.accessioned | 2022-10-24T17:13:00Z | |
| dc.date.available | 2022-07-20T15:09:25Z | |
| dc.date.available | 2022-10-24T17:13:00Z | |
| dc.date.issued | 2021 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/143884.2 | |
| dc.description.abstract | This paper presents new parallel algorithms for generating Euclidean minimum
spanning trees and spatial clustering hierarchies (known as HDBSCAN$^*$). Our
approach is based on generating a well-separated pair decomposition followed by
using Kruskal's minimum spanning tree algorithm and bichromatic closest pair
computations. We introduce a new notion of well-separation to reduce the work
and space of our algorithm for HDBSCAN$^*$. We also present a parallel
approximate algorithm for OPTICS based on a recent sequential algorithm by Gan
and Tao. Finally, we give a new parallel divide-and-conquer algorithm for
computing the dendrogram and reachability plots, which are used in visualizing
clusters of different scale that arise for both EMST and HDBSCAN$^*$. We show
that our algorithms are theoretically efficient: they have work (number of
operations) matching their sequential counterparts, and polylogarithmic depth
(parallel time).
We implement our algorithms and propose a memory optimization that requires
only a subset of well-separated pairs to be computed and materialized, leading
to savings in both space (up to 10x) and time (up to 8x). Our experiments on
large real-world and synthetic data sets using a 48-core machine show that our
fastest algorithms outperform the best serial algorithms for the problems by
11.13--55.89x, and existing parallel algorithms by at least an order of
magnitude. | en_US |
| dc.language.iso | en | |
| dc.publisher | Association for Computing Machinery (ACM) | en_US |
| dc.relation.isversionof | 10.1145/3448016.3457296 | en_US |
| dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
| dc.source | ACM | en_US |
| dc.title | Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Wang, Yiqiu, Yu, Shangdi, Gu, Yan and Shun, Julian. 2021. "Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering." Proceedings of the 2021 International Conference on Management of Data. | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
| dc.relation.journal | Proceedings of the 2021 International Conference on Management of Data | en_US |
| dc.eprint.version | Final published version | en_US |
| dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
| eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
| dc.date.updated | 2022-07-20T15:01:02Z | |
| dspace.orderedauthors | Wang, Y; Yu, S; Gu, Y; Shun, J | en_US |
| dspace.date.submission | 2022-07-20T15:01:03Z | |
| mit.license | PUBLISHER_POLICY | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |