ScaleGPS: Scalable Graph Parallel Sampling via Data-centric Performance Engineering

Cai, Miranda J.

Author(s)

Cai, Miranda J.

DownloadThesis PDF (1.034Mb)

Advisor

Chen, Xuhao

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Graph sampling extracts representative samples of a graph, so that approximate graph algorithms can be used in place of expensive, exact algorithms while still achieving highquality results. Thus, graph sampling plays an important role in many modern graph-based applications, such as graph machine learning and graph data mining. However, because of unstructured sparsity in the graph data and the randomness in the sampling algorithms, graph sampling often is the computational bottleneck. To accelerate it, there exist parallel graph sampling methods on multicore CPUs or GPUs. However, limitations arise at both sides. Due to lower throughput, CPU implementations are much slower than GPU ones, while GPU memory capacity is limited to only being able to handle small input graphs. We present the idea behind a scalable graph sampling framework, ScaleGPS, to support high performance graph sampling on huge graphs in a single machine with a CPU and a GPU. The key idea is to cooperatively employ data caching and compression to reduce memory footprint and data movement overhead, and thus achieve high performance and scalability. The challenge in applying caching and compression for graph sampling is two-fold. First, the randomness in sampling leads to redundant computation and memory accesses, and thus low work efficiency. Second, real-world graphs often exhibit skewed degree distribution, where a f ixed strategy cannot optimally handle various cases. We propose a hybrid and adaptive strategy to address this challenge. First, we split the vertices in the graph into two groups based on their degrees. For each group, we store the neighbor lists in different formats, to make full use of the scarce GPU memory resources. Based on this hybrid compression method, we use the GPU memory as a cache of the CPU memory, and adaptively cache hot data to minimize the data movement overhead between the CPU and GPU. We implement our strategy in ScaleGPS and evaluate it on a single machine with a 48-core CPU and an A100 GPU. Our experimental results on various sampling algorithms show that ScaleGPS is able to support billion-edge graphs (up to 84-billion) in a single machine. While the performance benefits over these large graphs are still undetermined, ScaleGPS achieves an average of 33.4× (up to 93×) speedups for smaller graphs over state-of-the-art parallel CPU implementations.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156640

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses