Leveraging prior information for real-time monocular simultaneous localization and mapping

Greene, W. Nicholas(William Nicholas)

Author(s)

Greene, W. Nicholas(William Nicholas)

Download1251896691-MIT.pdf (15.58Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics.

Advisor

Nicholas Roy.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Monocular cameras are powerful sensors for a variety of computer vision tasks since they are small, inexpensive, and provide dense perceptual information about the surrounding environment. Efficiently estimating the pose of a moving monocular camera and the 3D structure of the observed scene from the images alone is a fundamental problem in computer vision commonly referred to as monocular simultaneous localization and mapping (SLAM). Given the importance of egomotion estimation and environmental mapping to many applications in robotics and augmented reality, the last twenty years have seen dramatic advances in the state of the art in monocular SLAM. Despite the rapid progress, however, several limitations remain that prevent monocular SLAM systems from transitioning out of the research laboratory and into large, uncontrolled environments on small, resource-constrained computing platforms. This thesis presents research that attempts to address existing problems in monocular SLAM by leveraging different sources of prior information along with targeted applications of machine learning. First, we exploit the piecewise planar structure common in many environments in order to represent the scene using compact triangular meshes that will allow for faster reconstruction and regularization. Second, we leverage the semantic information encoded in large datasets of images to constrain the unobservable scale of motion of the monocular solution to the true, metric scale without additional sensors. Lastly, we compensate for known viewpoint changes when associating pixels between images in order to allow for robust, learning-based depth estimation across disparate views.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, February, 2021

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 135-151).

Date issued

2021

URI

https://hdl.handle.net/1721.1/130747

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Keywords

Aeronautics and Astronautics.

Collections

Doctoral Theses