MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Benchmarking Graph Transformers Toward Scalability for Large Graphs

Author(s)
Lim, Katherine S.
Thumbnail
DownloadThesis PDF (1.859Mb)
Advisor
Arvind
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Graph transformers (GTs) have gained popularity as an alternative to graph neural networks (GNNs) for deep learning on graph-structured data. In particular, the self-attention mechanism of GTs mitigates the fundamental limitations of over-squashing, over-smoothing, and limited expressiveness that GNNs face. Furthermore, like transformers used for natural language processing and computer vision, GTs have the potential to become foundation models that can be used for various downstream tasks. However, current GTs do not scale well to large graphs, due to computational cost. Here, we formulated a GT architecture as part of a larger scheme to build a GT made scalable through hierarchical attention and graph coarsening. Specifically, our goal was to optimize the GT building block of the scalable GT. By adding GraphGPS-inspired message-passing neural network (MPNN) layers to a modified version of the Spectral Attention Network (SAN) and performing hyperparameter tuning, we built a GT architecture that performs comparably to GraphGPS on the node classification task on the Cora and CiteSeer datasets. Compared to the modified version of SAN that we started with, our architecture is faster to train and evaluate, and also obtains higher node classification accuracies on the Cora and CiteSeer datasets. Our results demonstrate how message passing can effectively complement self-attention in GTs such as SAN to improve node classification performance. With further architectural improvement, we expect our model to serve as an effective building block for scalable GTs. Such scalable GTs may be used for node classification on large graphs, a common task for industrial applications.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156988
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.