dc.contributor.author | Greene, W. Nicholas (William Nicholas) | |
dc.contributor.author | Roy, Nicholas | |
dc.date.accessioned | 2021-11-03T20:03:07Z | |
dc.date.available | 2021-11-03T20:03:07Z | |
dc.date.issued | 2020-09 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/137312 | |
dc.description.abstract | © 2020 IEEE. We propose an efficient method for monocular simultaneous localization and mapping (SLAM) that is capable of estimating metrically-scaled motion without additional sensors or hardware acceleration by integrating metric depth predictions from a neural network into a geometric SLAM factor graph. Unlike learned end-to-end SLAM systems, ours does not ignore the relative geometry directly observable in the images. Unlike existing learned depth estimation approaches, ours leverages the insight that when used to estimate scale, learned depth predictions need only be coarse in image space. This allows us to shrink our network to the point that performing inference on a standard CPU becomes computationally tractable.We make several improvements to our network architecture and training procedure to address the lack of depth observability when using coarse images, which allows us to estimate spatially coarse, but depth-accurate predictions in only 30 ms per frame without GPU acceleration. At runtime we incorporate the learned metric data as unary scale factors in a Sim(3) pose graph. Our method is able to generate accurate, scaled poses without additional sensors, hardware accelerators, or special maneuvers and does not ignore or corrupt the observable epipolar geometry. We show compelling results on the KITTI benchmark dataset in addition to real-world experiments with a handheld camera. | en_US |
dc.description.sponsorship | NSF (Grant 1122374) | en_US |
dc.description.sponsorship | Army Research Laboratory (Contract W911NF-17-2-0181) | en_US |
dc.language.iso | en | |
dc.publisher | IEEE | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/ICRA40945.2020.9196900 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | MIT web domain | en_US |
dc.title | Metrically-Scaled Monocular SLAM using Learned Scale Factors | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Greene, W. Nicholas (William Nicholas) and Roy, Nicholas. 2020. "Metrically-Scaled Monocular SLAM using Learned Scale Factors." Proceedings - IEEE International Conference on Robotics and Automation. | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.relation.journal | Proceedings - IEEE International Conference on Robotics and Automation | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2021-05-03T18:45:15Z | |
dspace.orderedauthors | Greene, WN; Roy, N | en_US |
dspace.date.submission | 2021-05-03T18:45:16Z | |
mit.license | OPEN_ACCESS_POLICY | |
mit.metadata.status | Publication Information Needed | en_US |