MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Fast and Scalable Subgraph Learning

Author(s)
Liang, Derrick
Thumbnail
DownloadThesis PDF (774.2Kb)
Advisor
Leiserson, Charles E.
Kaler, Tim
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Graph Neural Networks (GNNs) are a powerful framework for learning over structured data, enabling predictive modeling across domains such as bioinformatics, recommendation systems, and financial fraud detection. While scalable systems like SALIENT++ have advanced the training of node-level GNN tasks at industrial scale, they do not support an emerging class of workloads: subgraph classification, which is increasingly common in real-world applications. Prior implementations address this gap by modifying both the data pipeline and the model architecture—but at the cost of composability, creating tightly coupled systems that slow further development. This thesis introduces MOSAIC, a lightweight data transformation that reframes subgraph classification as nodewise prediction by augmenting the graph with representative nodes. This approach enables direct compatibility with SALIENT++ and other nodewise systems while decoupling workload format, dataloader design, and model architecture. I demonstrate that MOSAIC enables modular reuse of architectures like GraphSAGE and subgraph-aware components from GLASS, while preserving SALIENT++’s system-level scalability. On the large-scale Elliptic2 dataset, this integration reduces training memory usage by 2.8× and epoch runtime from over 90 minutes to 0.4 seconds—while improving classification performance. I implement MOSAIC as a succinct (<100-line), reusable preprocessing script, enabling integration of the GLASS architecture into SALIENT++ in <10 lines of code, compared to Wang et al.’s tightly coupled 500+ line design. These results highlight the feasibility of scalable, composable experimentation for subgraph learning tasks in high-performance GNN systems.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162735
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.