Benchmarking Graph Transformers Toward Scalability for Large Graphs

Lim, Katherine S.

Author(s)

Lim, Katherine S.

DownloadThesis PDF (1.859Mb)

Advisor

Arvind

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Graph transformers (GTs) have gained popularity as an alternative to graph neural networks (GNNs) for deep learning on graph-structured data. In particular, the self-attention mechanism of GTs mitigates the fundamental limitations of over-squashing, over-smoothing, and limited expressiveness that GNNs face. Furthermore, like transformers used for natural language processing and computer vision, GTs have the potential to become foundation models that can be used for various downstream tasks. However, current GTs do not scale well to large graphs, due to computational cost. Here, we formulated a GT architecture as part of a larger scheme to build a GT made scalable through hierarchical attention and graph coarsening. Specifically, our goal was to optimize the GT building block of the scalable GT. By adding GraphGPS-inspired message-passing neural network (MPNN) layers to a modified version of the Spectral Attention Network (SAN) and performing hyperparameter tuning, we built a GT architecture that performs comparably to GraphGPS on the node classification task on the Cora and CiteSeer datasets. Compared to the modified version of SAN that we started with, our architecture is faster to train and evaluate, and also obtains higher node classification accuracies on the Cora and CiteSeer datasets. Our results demonstrate how message passing can effectively complement self-attention in GTs such as SAN to improve node classification performance. With further architectural improvement, we expect our model to serve as an effective building block for scalable GTs. Such scalable GTs may be used for node classification on large graphs, a common task for industrial applications.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156988

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses