Fast and Scalable Subgraph Learning

Liang, Derrick

dc.contributor.advisor	Leiserson, Charles E.
dc.contributor.advisor	Kaler, Tim
dc.contributor.author	Liang, Derrick
dc.date.accessioned	2025-09-18T14:29:46Z
dc.date.available	2025-09-18T14:29:46Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:02:51.733Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162735
dc.description.abstract	Graph Neural Networks (GNNs) are a powerful framework for learning over structured data, enabling predictive modeling across domains such as bioinformatics, recommendation systems, and financial fraud detection. While scalable systems like SALIENT++ have advanced the training of node-level GNN tasks at industrial scale, they do not support an emerging class of workloads: subgraph classification, which is increasingly common in real-world applications. Prior implementations address this gap by modifying both the data pipeline and the model architecture—but at the cost of composability, creating tightly coupled systems that slow further development. This thesis introduces MOSAIC, a lightweight data transformation that reframes subgraph classification as nodewise prediction by augmenting the graph with representative nodes. This approach enables direct compatibility with SALIENT++ and other nodewise systems while decoupling workload format, dataloader design, and model architecture. I demonstrate that MOSAIC enables modular reuse of architectures like GraphSAGE and subgraph-aware components from GLASS, while preserving SALIENT++’s system-level scalability. On the large-scale Elliptic2 dataset, this integration reduces training memory usage by 2.8× and epoch runtime from over 90 minutes to 0.4 seconds—while improving classification performance. I implement MOSAIC as a succinct (<100-line), reusable preprocessing script, enabling integration of the GLASS architecture into SALIENT++ in <10 lines of code, compared to Wang et al.’s tightly coupled 500+ line design. These results highlight the feasibility of scalable, composable experimentation for subgraph learning tasks in high-performance GNN systems.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Fast and Scalable Subgraph Learning
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: liang-derrick8-meng-eecs-2025- ...
Size:: 774.2Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record