Show simple item record

dc.contributor.authorTang, Haotian
dc.contributor.authorYang, Shang
dc.contributor.authorLiu, Zhijian
dc.contributor.authorHong, Ke
dc.contributor.authorYu, Zhongming
dc.contributor.authorLi, Xiuyu
dc.contributor.authorDai, Guohao
dc.contributor.authorWang, Yu
dc.contributor.authorHan, Song
dc.date.accessioned2024-01-02T19:51:01Z
dc.date.available2024-01-02T19:51:01Z
dc.date.issued2023-10-28
dc.identifier.isbn979-8-4007-0329-4
dc.identifier.urihttps://hdl.handle.net/1721.1/153260
dc.description.abstractSparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g. implicit GEMM) are highly performant but have very high engineering costs. In this paper, we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9 × , 3.3 × , 2.2 × and 1.7 × measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3 × faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6 × faster inference speed compared with state-of-the-art graph deep learning libraries. Our code is publicly released at https://github.com/mit-han-lab/torchsparse.en_US
dc.publisherACM|56th Annual IEEE/ACM International Symposium on Microarchitectureen_US
dc.relation.isversionofhttps://doi.org/10.1145/3613424.3614303en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.titleTorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUsen_US
dc.typeArticleen_US
dc.identifier.citationTang, Haotian, Yang, Shang, Liu, Zhijian, Hong, Ke, Yu, Zhongming et al. 2023. "TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs."
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.mitlicensePUBLISHER_CC
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2024-01-01T08:47:54Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-01-01T08:47:55Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record