Algorithms for single-view depth image estimation
Author(s)
Ma, Fangchang.
Download1119667773-MIT.pdf (17.71Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Aeronautics and Astronautics.
Advisor
Sertac Karaman.
Terms of use
Metadata
Show full item recordAbstract
Depth sensing is fundamental in autonomous navigation, localization, and mapping. However, existing depth sensors offer many shortcomings, especially low effective spatial resolutions. In order to attain enhanced resolution with existing hardware, this dissertation studies the single-view depth estimation problem - the goal is to reconstruct the dense and complete 3D structures of the scene, given only sparse depth measurements. To this end, this thesis proposes three different algorithms for depth estimation. The first contribution is an algorithm for efficient reconstruction of 3D planar surfaces. This algorithm assumes that the 3D structure is piecewise-planar, and thus the second-order derivatives of the depth image are sparse. We develop a linear programming problem for recovery of the 3D surfaces under such assumptions, and provide conditions under which the reconstruction is exact. This method requires no learning, but still outperforms deep learning-based methods under certain conditions. The second contribution is a deep regression network and a self-supervised learning framework. We formulate the depth completion problem as a pixel-level regression problem and solve it by training a neural network. Additionally, to address the difficulty in gathering ground truth annotations for depth data, we develop a self-supervised framework that trains the regression network by enforcing temporal photometric consistency, using only raw RGB and sparse depth data. The supervised method achieves state-of-the-art accuracy, and the self-supervised approach attains a lower but comparable accuracy. Our third contribution is a two-stage algorithm for a broad class of inverse problems (e.g., depth completion and image inpainting). We assume that the target image is the output of a generative neural network, and only a subset of the output pixels is observed. The goal is to reconstruct the unseen pixels based on the partial samples. Our proposed algorithm first recovers the corresponding low-dimensional input latent vector using simple gradient-descent, and then reconstructs the entire output with a single forward pass. We provide conditions under which the proposed algorithm achieves exact reconstruction, and empirically demonstrate the effectiveness of such algorithms on real data.
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: Ph. D. in Autonomous Systems, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 143-158).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Aeronautics and AstronauticsPublisher
Massachusetts Institute of Technology
Keywords
Aeronautics and Astronautics.