Show simple item record

dc.contributor.advisorLeiserson, Charles E.
dc.contributor.advisorKaler, Tim
dc.contributor.authorLiang, Derrick
dc.date.accessioned2025-09-18T14:29:46Z
dc.date.available2025-09-18T14:29:46Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:02:51.733Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162735
dc.description.abstractGraph Neural Networks (GNNs) are a powerful framework for learning over structured data, enabling predictive modeling across domains such as bioinformatics, recommendation systems, and financial fraud detection. While scalable systems like SALIENT++ have advanced the training of node-level GNN tasks at industrial scale, they do not support an emerging class of workloads: subgraph classification, which is increasingly common in real-world applications. Prior implementations address this gap by modifying both the data pipeline and the model architecture—but at the cost of composability, creating tightly coupled systems that slow further development. This thesis introduces MOSAIC, a lightweight data transformation that reframes subgraph classification as nodewise prediction by augmenting the graph with representative nodes. This approach enables direct compatibility with SALIENT++ and other nodewise systems while decoupling workload format, dataloader design, and model architecture. I demonstrate that MOSAIC enables modular reuse of architectures like GraphSAGE and subgraph-aware components from GLASS, while preserving SALIENT++’s system-level scalability. On the large-scale Elliptic2 dataset, this integration reduces training memory usage by 2.8× and epoch runtime from over 90 minutes to 0.4 seconds—while improving classification performance. I implement MOSAIC as a succinct (<100-line), reusable preprocessing script, enabling integration of the GLASS architecture into SALIENT++ in <10 lines of code, compared to Wang et al.’s tightly coupled 500+ line design. These results highlight the feasibility of scalable, composable experimentation for subgraph learning tasks in high-performance GNN systems.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleFast and Scalable Subgraph Learning
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record