Efficient Multi-Sensor Fusion for 3D Perception
Author(s)
Shao, Kevin
DownloadThesis PDF (10.67Mb)
Advisor
Han, Song
Terms of use
Metadata
Show full item recordAbstract
As a critical component to realizing widespread autonomous driving, 3D perception systems have come to be heavily studied in the community. However, many solutions are solely focused on merely achieving the highest accuracy – overlooking other practical considerations such as speed and cost. In this thesis, I develop two multisensor fusion models for 3D Perception: BEVFusion, a camera-LiDAR fusion model, and BEVFusion-R, a camera-radar fusion model. BEVFusion seeks to balance accuracy and speed. By fusing features from each input modality in the shared bird’s eye view space, it captures both semantic and geometric information from each input. Its simple design allows it to achieve both state-of-the-art accuracy and a 24% speedup over competing works. BEVFusion-R further incorporates cost and hardware deployment into the design consideration. By carefully designing the entire model with both performance and acceleration, BEVFusion-R achieves a 2.1% NDS improvement on nuScenes over the previous state-of-the-art with a 4.5× measured speedup. Additionally, it is capable of real-time latency on edge GPUs. The code will be publicly released at https://github.com/mit-han-lab/bevfusion
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology