Fast Parallel Algorithms and Library for Spatial Clustering and Computational Geometry
Author(s)
Wang, Yiqiu
DownloadThesis PDF (3.735Mb)
Advisor
Shun, Julian
Terms of use
Metadata
Show full item recordAbstract
This thesis presents novel parallel shared-memory multi-core algorithms, implementations, and frameworks for efficiently solving large-scale spatial clustering and computational geometry problems. The primary focus is on designing theoretically-efficient and practical algorithms that can handle the increasing demand for faster processing speeds in spatial data sets.
In the first part of the thesis, we introduce new parallel algorithms and framework for spatial clustering. We design new parallel algorithms for exact and approximate DBSCAN, which match the work complexity of the best sequential algorithms while maintaining low depth. Extensive experiments demonstrate that our algorithms achieve massive speedup over existing algorithms and can efficiently process large-scale data sets. We also present new parallel algorithms for hierarchical DBSCAN (HDBSCAN) and Euclidean minimum spanning tree (EMST), including several theoretical results and practical optimizations. Furthermore, we propose a method to generate a dendrogram from the minimum spanning tree (MST) of the HDBSCAN or EMST problem. The EMST also solves single-linkage clustering. Lastly, we also design a framework for implementing parallel grid-based clustering algorithms.
The second part of the thesis introduces our contributions to parallel algorithms and a library for computational geometry. We contribute to three problems in computational geometry: a new parallel reservation-based algorithm that can express both randomized incremental convex hull and quickhull algorithms; a sampling-based algorithm to reduce work for the smallest enclosing ball problem; and a parallel batch-dynamic data structure for dynamic closest pair problem. We also introduce ParGeo, a library for parallel computational geometry that provides various parallel geometric algorithms, data structures, and graph generators. Our experimental evaluations show significant speedups achieved by our proposed algorithms across different problems.
Overall, this thesis demonstrates that parallel shared-memory multi-core algorithms, implementations, and frameworks can efficiently solve large-scale spatial clustering and computational geometry problems both in theory and practice.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology