Show simple item record

dc.contributor.advisorHan, Song
dc.contributor.authorTang, Haotian
dc.date.accessioned2022-08-29T16:10:36Z
dc.date.available2022-08-29T16:10:36Z
dc.date.issued2022-05
dc.date.submitted2022-06-21T19:25:51.065Z
dc.identifier.urihttps://hdl.handle.net/1721.1/144771
dc.description.abstractDeep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds due to poor system support. Therefore, in this thesis, we tackle the challenging problem of accelerating deep learning on point clouds via system-algorithm co-design. We first introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5× speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7×. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6× and 1.5× measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively. We further notice that the dominant module in state-of-the-art point cloud networks, Sparse Convolution, falls short in accurately modeling small objects in the large-scale outdoor scenes. As such, we further propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this pointbased branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1 st on the competitive SemanticKITTI leaderboard upon publication. It also achieves 8× computation reduction and 3× measured speedup over MinkowskiNet still with higher accuracy. SPVNAS is also the 1 st place winner at the semantic segmentation challenge of 6th AI Driving Olympics and 2nd place holder at the nuScenes panoptic segmentation challenge in 2021.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleEfficient Systems and Algorithms for Deep Learning on Point Clouds
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record