Towards Effective Theories for Deep Learning and Beyond

Zhai, Xiyu

Author(s)

Zhai, Xiyu

DownloadThesis PDF (2.164Mb)

Advisor

Rakhlin, Alexander (Sasha)

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Deep learning has been quite successful in the past decade, with fantastic progress made across multiple domains such as computer vision, natural language processing, and Reinforcement learning. However, the theoretical understanding of its success is limited, and its behavior constantly defies our traditional theoretical understanding of machine learning. We will present our work on deep learning theory and show powerful techniques we developed for studying wide neural networks’ optimization and generalization behavior that help narrow the gap between theory and practice. Inspired by the recent success of transformer architectures like ChatGPT and SORA, we would like to present our work on the expressive power of vision transformers. Besides, we have been working on new AI theories and algorithms that go beyond deep learning and a new AI programming system that aims to make the implementation of these new ideas possible. It goes beyond the scope of a PhD to finish a demonstration of the new AI, but we aim to show strong evidence of its feasibility.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156548

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses