Algorithms for single-view depth image estimation

Ma, Fangchang.

Author(s)

Ma, Fangchang.

Download1119667773-MIT.pdf (17.71Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics.

Advisor

Sertac Karaman.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Depth sensing is fundamental in autonomous navigation, localization, and mapping. However, existing depth sensors offer many shortcomings, especially low effective spatial resolutions. In order to attain enhanced resolution with existing hardware, this dissertation studies the single-view depth estimation problem - the goal is to reconstruct the dense and complete 3D structures of the scene, given only sparse depth measurements. To this end, this thesis proposes three different algorithms for depth estimation. The first contribution is an algorithm for efficient reconstruction of 3D planar surfaces. This algorithm assumes that the 3D structure is piecewise-planar, and thus the second-order derivatives of the depth image are sparse. We develop a linear programming problem for recovery of the 3D surfaces under such assumptions, and provide conditions under which the reconstruction is exact.

This method requires no learning, but still outperforms deep learning-based methods under certain conditions. The second contribution is a deep regression network and a self-supervised learning framework. We formulate the depth completion problem as a pixel-level regression problem and solve it by training a neural network. Additionally, to address the difficulty in gathering ground truth annotations for depth data, we develop a self-supervised framework that trains the regression network by enforcing temporal photometric consistency, using only raw RGB and sparse depth data. The supervised method achieves state-of-the-art accuracy, and the self-supervised approach attains a lower but comparable accuracy. Our third contribution is a two-stage algorithm for a broad class of inverse problems (e.g., depth completion and image inpainting). We assume that the target image is the output of a generative neural network, and only a subset of the output pixels is observed.

The goal is to reconstruct the unseen pixels based on the partial samples. Our proposed algorithm first recovers the corresponding low-dimensional input latent vector using simple gradient-descent, and then reconstructs the entire output with a single forward pass. We provide conditions under which the proposed algorithm achieves exact reconstruction, and empirically demonstrate the effectiveness of such algorithms on real data.

Description

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Thesis: Ph. D. in Autonomous Systems, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2019

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 143-158).

Date issued

2019

URI

https://hdl.handle.net/1721.1/122371

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Keywords

Aeronautics and Astronautics.

Collections

Doctoral Theses