Advancing Deep Learning Efficiency: From Specialized Co-Design to Automated Generation
Author(s)
Lin, Yujun
DownloadThesis PDF (49.09Mb)
Advisor
Han, Song
Terms of use
Metadata
Show full item recordAbstract
The explosive growth of artificial intelligence (AI) technologies, particularly large-scale deep learning models such as large language models and diffusion models, has intensified the demand for efficient full-stack inference solutions that effectively balance performance and costs. This work will present a comprehensive exploration into the algorithm-system co-optimization, hardware design specialization and automation for scalable AI deployment. First, we begin with algorithmic optimization for large-scale models, including large language models and diffusion models, developing inference libraries that leverage quantization to boost the performance of generative AIs on existing GPU platforms. Next, we design specialized hardware accelerators for domain-specific applications, specifically point cloud understanding, emphasizing efficiency improvements through the exploitation of data sparsity. Finally, we open up the hardware design space beyond template-based sizing, and progress into the automated learning-based co-design of neural network and hardware architectures, maximizing their synergy with a full-stack joint optimization. We then introduce an automated framework for spatial accelerator generation, transforming high-level mappings into custom hardware designs that support scalable deployment. Together, these contributions advance AI inference efficiency by bridging the gap between advanced computational requirements and hardware capabilities, between theoretical potential and practical solutions, and between design cost and effectiveness.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology