Consistent Depth Estimation in Data-Driven Simulation for Autonomous Driving
MetadataShow full item record
In this work we propose consistent depth estimation for viewpoint reconstruction in data-driven simulation, combining aspects of learning-based monocular depth prediction and structure-from-motion to increase temporal video depth accuracy. We demonstrate efficacy in VISTA, an end-to-end autonomous vehicle simulation engine capable of training robust control policies directly applicable to the real-world. Taking advantage of geometrically consistent depth map estimations, we see a several order of magnitude improvement in whole-frame depth accuracy averaged over the course of input traces compared to VISTA’s current depth method, and a 39% reduction in intra-frame depth variance compared to current state of the art methods (i.e. Monodepth2) while maintaining similar error. Better depth enables more accurate viewpoint reconstruction thus improving the training of reinforcement learning (RL) control policies in simulation, increasing RL-based control’s practicality. We train several end-to-end policy gradient models in varying versions of VISTA, each utilizing a different depth method, and see that end-to-end models trained in the consistent depth version of VISTA deviate least from the human driven center line.
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology