Fast and Scalable Subgraph Learning

Liang, Derrick

Author(s)

Liang, Derrick

DownloadThesis PDF (774.2Kb)

Advisor

Leiserson, Charles E.

Kaler, Tim

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Graph Neural Networks (GNNs) are a powerful framework for learning over structured data, enabling predictive modeling across domains such as bioinformatics, recommendation systems, and financial fraud detection. While scalable systems like SALIENT++ have advanced the training of node-level GNN tasks at industrial scale, they do not support an emerging class of workloads: subgraph classification, which is increasingly common in real-world applications. Prior implementations address this gap by modifying both the data pipeline and the model architecture—but at the cost of composability, creating tightly coupled systems that slow further development. This thesis introduces MOSAIC, a lightweight data transformation that reframes subgraph classification as nodewise prediction by augmenting the graph with representative nodes. This approach enables direct compatibility with SALIENT++ and other nodewise systems while decoupling workload format, dataloader design, and model architecture. I demonstrate that MOSAIC enables modular reuse of architectures like GraphSAGE and subgraph-aware components from GLASS, while preserving SALIENT++’s system-level scalability. On the large-scale Elliptic2 dataset, this integration reduces training memory usage by 2.8× and epoch runtime from over 90 minutes to 0.4 seconds—while improving classification performance. I implement MOSAIC as a succinct (<100-line), reusable preprocessing script, enabling integration of the GLASS architecture into SALIENT++ in <10 lines of code, compared to Wang et al.’s tightly coupled 500+ line design. These results highlight the feasibility of scalable, composable experimentation for subgraph learning tasks in high-performance GNN systems.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162735

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses