Efficient Systems and Algorithms for Deep Learning on Point Clouds

Tang, Haotian

dc.contributor.advisor	Han, Song
dc.contributor.author	Tang, Haotian
dc.date.accessioned	2022-08-29T16:10:36Z
dc.date.available	2022-08-29T16:10:36Z
dc.date.issued	2022-05
dc.date.submitted	2022-06-21T19:25:51.065Z
dc.identifier.uri	https://hdl.handle.net/1721.1/144771
dc.description.abstract	Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds due to poor system support. Therefore, in this thesis, we tackle the challenging problem of accelerating deep learning on point clouds via system-algorithm co-design. We first introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5× speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7×. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6× and 1.5× measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively. We further notice that the dominant module in state-of-the-art point cloud networks, Sparse Convolution, falls short in accurately modeling small objects in the large-scale outdoor scenes. As such, we further propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this pointbased branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1 st on the competitive SemanticKITTI leaderboard upon publication. It also achieves 8× computation reduction and 3× measured speedup over MinkowskiNet still with higher accuracy. SPVNAS is also the 1 st place winner at the semantic segmentation challenge of 6th AI Driving Olympics and 2nd place holder at the nuScenes panoptic segmentation challenge in 2021.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Efficient Systems and Algorithms for Deep Learning on Point Clouds
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Tang-kentang-SM-EECS-2022-thes ...
Size:: 11.01Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record