Towards Effective Theories for Deep Learning and Beyond
Author(s)
Zhai, Xiyu
DownloadThesis PDF (2.164Mb)
Advisor
Rakhlin, Alexander (Sasha)
Terms of use
Metadata
Show full item recordAbstract
Deep learning has been quite successful in the past decade, with fantastic progress made across multiple domains such as computer vision, natural language processing, and Reinforcement learning. However, the theoretical understanding of its success is limited, and its behavior constantly defies our traditional theoretical understanding of machine learning. We will present our work on deep learning theory and show powerful techniques we developed for studying wide neural networks’ optimization and generalization behavior that help narrow the gap between theory and practice. Inspired by the recent success of transformer architectures like ChatGPT and SORA, we would like to present our work on the expressive power of vision transformers. Besides, we have been working on new AI theories and algorithms that go beyond deep learning and a new AI programming system that aims to make the implementation of these new ideas possible. It goes beyond the scope of a PhD to finish a demonstration of the new AI, but we aim to show strong evidence of its feasibility.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology