Efficient Multi-Sensor Fusion for 3D Perception

Shao, Kevin

Author(s)

Shao, Kevin

DownloadThesis PDF (10.67Mb)

Advisor

Han, Song

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

As a critical component to realizing widespread autonomous driving, 3D perception systems have come to be heavily studied in the community. However, many solutions are solely focused on merely achieving the highest accuracy – overlooking other practical considerations such as speed and cost. In this thesis, I develop two multisensor fusion models for 3D Perception: BEVFusion, a camera-LiDAR fusion model, and BEVFusion-R, a camera-radar fusion model. BEVFusion seeks to balance accuracy and speed. By fusing features from each input modality in the shared bird’s eye view space, it captures both semantic and geometric information from each input. Its simple design allows it to achieve both state-of-the-art accuracy and a 24% speedup over competing works. BEVFusion-R further incorporates cost and hardware deployment into the design consideration. By carefully designing the entire model with both performance and acceleration, BEVFusion-R achieves a 2.1% NDS improvement on nuScenes over the previous state-of-the-art with a 4.5× measured speedup. Additionally, it is capable of real-time latency on edge GPUs. The code will be publicly released at https://github.com/mit-han-lab/bevfusion

Date issued

2023-09

URI

https://hdl.handle.net/1721.1/152887

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses