Efficient Deep Learning Systems for Visual Perception on
the Edge

Yang, Shang

dc.contributor.advisor	Han, Song
dc.contributor.author	Yang, Shang
dc.date.accessioned	2025-04-14T14:05:30Z
dc.date.available	2025-04-14T14:05:30Z
dc.date.issued	2025-02
dc.date.submitted	2025-03-04T17:29:08.962Z
dc.identifier.uri	https://hdl.handle.net/1721.1/159099
dc.description.abstract	Deep learning for visual perception on edge devices has become increasingly critical, driven by emerging applications in autonomous driving and AR/VR. Typically, sparse convolution on 3D point clouds and Visual Language Models (VLMs) for image processing are two important methods for visual understanding and reasoning. However, the limited compute resources and memory on edge devices pose significant challenges, necessitating specialized system support for deep learning models. Specifically, the efficiency challenges for edge visual perception are twofold: First, the sparsity and inherent irregularity of point cloud data introduce substantial complexity for parallel processing. Second, the colossal model sizes and amount of computation of LLMs and VLMs render edge deployment particularly challenging. In this thesis, we aim to address the efficiency issues of on-device deep learning via system-algorithm co-design. We first introduce TorchSparse++, a high-performance inference engine for sparse convolution on GPUs. Unlike existing sparse convolution systems, TorchSparse++ well balances the efficiency and implementation simplicity, achieving the best performance across different application scenarios. Specifically, we first create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9×, 3.3×, 2.2× and 1.7× measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3× faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6× faster inference speed compared with state-of-the-art graph deep learning libraries. Furthermore, to democratize the power of large foundation models in edge AI, we propose AWQ and TinyChat, a hardware-friendly full-stack solution for efficient on-device LLM and VLM deployment. AWQ is a novel quantization method based on the insight that not all weights in an LLM are equally important. Protecting only 1% salient weights can greatly reduce quantization error. Specifically, AWQ employs an equivalent transformation and scales up the salient weight channels to reduce the weight quantization error, during which the scale is determined by collecting the activation statistics offline. Alongside AWQ, we further introduce TinyChat, an efficient and flexible inference framework tailored for 4-bit on-device LLM/VLMs. With on-the-fly dequantization, extensive kernel fusion and platform-aware weight packing, TinyChat offers 2.7-3.7× speedup over the Huggingface FP16 implementation on both desktop and mobile GPUs. It also enables the deployment of the 70B Llama-2 model on mobile GPUs. Together, these techniques significantly reduce the computational and memory costs for deploying deep learning models on edge devices, increasing the accessibility of deep learning for practical application. We hope that this thesis can inspire future research on efficient edge AI across diverse modalities.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Efficient Deep Learning Systems for Visual Perception on the Edge
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: yang-shangy-sm-eecs-2025-thesis.pdf
Size:: 39.19Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record

Efficient Deep Learning Systems for Visual Perception on the Edge

Files in this item

This item appears in the following Collection(s)