Modeling Intelligence via Graph Neural Networks

Xu, Keyulu

Author(s)

Xu, Keyulu

DownloadThesis PDF (11.44Mb)

Advisor

Jegelka, Stefanie

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Artificial intelligence can be more powerful than human intelligence. Many problems are perhaps challenging from a human perspective. These could be seeking statistical patterns in complex and structured objects, such as drug molecules and the global financial system. Advances in deep learning have shown that the key to solving such tasks is to learn a good representation. Given the representations of the world, the second aspect of intelligence is reasoning. Learning to reason implies learning to implement a correct reasoning process, within and outside the training distribution. In this thesis, we address the fundamental problem of modeling intelligence that can learn to represent and reason about the world. We study both questions from the lens of graph neural networks, a class of neural networks acting on graphs. First, we can abstract many objects in the world as graphs and learn their representations with graph neural networks. Second, we shall see how graph neural networks exploit the algorithmic structure in reasoning processes to improve generalization. This thesis consists of four parts. Each part also studies one aspect of the theoretical landscape of learning: the representation power, generalization, extrapolation, and optimization. In Part I, we characterize the expressive power of graph neural networks for representing graphs, and build maximally powerful graph neural networks. In Part II, we analyze generalization and show implications for what reasoning a neural network can sample-efficiently learn. Our analysis takes into account the training algorithm, the network structure, and the task structure. In Part III, we study how neural networks extrapolate and under what conditions they learn the correct reasoning outside the training distribution. In Part IV, we prove global convergence rates and develop normalization methods that accelerate the training of graph neural networks. Our techniques and insights go beyond graph neural networks, and extend broadly to deep learning models.

Date issued

2021-06

URI

https://hdl.handle.net/1721.1/139331

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses