Towards Effective Theories for Deep Learning and Beyond

Zhai, Xiyu

dc.contributor.advisor	Rakhlin, Alexander (Sasha)
dc.contributor.author	Zhai, Xiyu
dc.date.accessioned	2024-09-03T21:06:28Z
dc.date.available	2024-09-03T21:06:28Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-10T13:02:27.164Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156548
dc.description.abstract	Deep learning has been quite successful in the past decade, with fantastic progress made across multiple domains such as computer vision, natural language processing, and Reinforcement learning. However, the theoretical understanding of its success is limited, and its behavior constantly defies our traditional theoretical understanding of machine learning. We will present our work on deep learning theory and show powerful techniques we developed for studying wide neural networks’ optimization and generalization behavior that help narrow the gap between theory and practice. Inspired by the recent success of transformer architectures like ChatGPT and SORA, we would like to present our work on the expressive power of vision transformers. Besides, we have been working on new AI theories and algorithms that go beyond deep learning and a new AI programming system that aims to make the implementation of these new ideas possible. It goes beyond the scope of a PhD to finish a demonstration of the new AI, but we aim to show strong evidence of its feasibility.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Towards Effective Theories for Deep Learning and Beyond
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: zhai-xiyuzhai-phd-eecs-2024-th ...
Size:: 2.164Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record