Learning 3D Representations from Data

Wang, Yue

dc.contributor.advisor	Solomon, Justin M.
dc.contributor.author	Wang, Yue
dc.date.accessioned	2022-08-29T16:03:18Z
dc.date.available	2022-08-29T16:03:18Z
dc.date.issued	2022-05
dc.date.submitted	2022-06-21T19:15:24.826Z
dc.identifier.uri	https://hdl.handle.net/1721.1/144666
dc.description.abstract	Deep learning has achieved tremendous progress and success in processing images and natural languages. Deep models enable human-level perception, photorealistic image generation, and conversational language understanding. Despite significant progress, existing deep models still fail to meet the demands of robotics. There are several factors leading to this gap. First, existing computer vision algorithms have been primarily targeted to 2D images. These algorithms are extremely good at recognizing objects in an image, but they fail to reason about 3D geometry. Second, the current success in the 2D domain is mainly due to the advance in convolutional neural networks (CNNs). However, CNNs do not generalize to arbitrary data modalities such as point clouds. Finally, 3D annotations are scarce and hard to obtain. Annotating 3D data usually requires more human effort, which hinders supervised learning from 3D data. Therefore, learning 3D representations from data remains challenging and demands further study. This thesis investigates how to learn representations from 3D data efficiently and effectively. This thesis aims to design 3D learning algorithms that understand geometry with minimal supervision. First, we proposed a general point cloud network , termed Dynamic Graph Convolutional Neural Networks (DGCNN), to learn a latent structure from sensory inputs. The induced structure will improve feature learning from point clouds. Unlike prior works that focus on global features, DGCNN views local geometry as the key to point cloud feature learning. Second, we study using DGCNN to enable high-level semantic reasoning tasks such as shape segmentation and 3D object detection. To that end, we propose a multi-view based object detection model that learns complementary features by projecting point clouds to virtual views. In addition, our follow-up work Object DGCNN leverages DGCNN to model object relations and empowers a post-processing free object detection pipeline with state-of-the-art performances on multiple benchmarks. Third, we generalize these point cloud models to tackle low-level motion estimation problems such as point cloud registration. The proposed Deep Closest Point architecture combines a traditional optimization pipeline with deep learning. Moreover, Partial Registration Network (PRNet) uses shape registration as a proxy task to enable self-supervised learning from point clouds in a subsequent work. Finally, this thesis allows for a critical application -- scene understanding for autonomous driving. These studies collectively facilitate 3D deep learning in a broad range of scenarios in visual computing.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Learning 3D Representations from Data
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	https://orcid.org/ 0000-0002-2751-8200
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: Wang-yuewangx-PhD-EECS-2022-th ...
Size:: 38.62Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record