Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Tang, Haotian; Liu, Zhijian; Zhao, Shengyu; Lin, Yujun; Lin, Ji; Wang, Hanrui; Han, Song

dc.contributor.author	Tang, Haotian
dc.contributor.author	Liu, Zhijian
dc.contributor.author	Zhao, Shengyu
dc.contributor.author	Lin, Yujun
dc.contributor.author	Lin, Ji
dc.contributor.author	Wang, Hanrui
dc.contributor.author	Han, Song
dc.date.accessioned	2022-07-12T13:12:22Z
dc.date.available	2022-07-12T13:12:22Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/1721.1/143668
dc.description.abstract	© 2020, Springer Nature Switzerland AG. Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1 on the competitive SemanticKITTI leaderboard. It also achieves 8–23 computation reduction and 3 measured speedup over MinkowskiNet and KPConv with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.	en_US
dc.language.iso	en
dc.publisher	Springer International Publishing	en_US
dc.relation.isversionof	10.1007/978-3-030-58604-1_41	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution	en_US
dc.type	Article	en_US
dc.identifier.citation	Tang, Haotian, Liu, Zhijian, Zhao, Shengyu, Lin, Yujun, Lin, Ji et al. 2020. "Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution." Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12373.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journal	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2022-07-12T13:07:18Z
dspace.orderedauthors	Tang, H; Liu, Z; Zhao, S; Lin, Y; Lin, J; Wang, H; Han, S	en_US
dspace.date.submission	2022-07-12T13:07:22Z
mit.journal.volume	12373	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 2007.16100.pdf
Size:: 4.735Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record