Show simple item record

dc.contributor.advisorSolomon, Justin M.
dc.contributor.authorWang, Yue
dc.date.accessioned2022-08-29T16:03:18Z
dc.date.available2022-08-29T16:03:18Z
dc.date.issued2022-05
dc.date.submitted2022-06-21T19:15:24.826Z
dc.identifier.urihttps://hdl.handle.net/1721.1/144666
dc.description.abstractDeep learning has achieved tremendous progress and success in processing images and natural languages. Deep models enable human-level perception, photorealistic image generation, and conversational language understanding. Despite significant progress, existing deep models still fail to meet the demands of robotics. There are several factors leading to this gap. First, existing computer vision algorithms have been primarily targeted to 2D images. These algorithms are extremely good at recognizing objects in an image, but they fail to reason about 3D geometry. Second, the current success in the 2D domain is mainly due to the advance in convolutional neural networks (CNNs). However, CNNs do not generalize to arbitrary data modalities such as point clouds. Finally, 3D annotations are scarce and hard to obtain. Annotating 3D data usually requires more human effort, which hinders supervised learning from 3D data. Therefore, learning 3D representations from data remains challenging and demands further study. This thesis investigates how to learn representations from 3D data efficiently and effectively. This thesis aims to design 3D learning algorithms that understand geometry with minimal supervision. First, we proposed a general point cloud network , termed Dynamic Graph Convolutional Neural Networks (DGCNN), to learn a latent structure from sensory inputs. The induced structure will improve feature learning from point clouds. Unlike prior works that focus on global features, DGCNN views local geometry as the key to point cloud feature learning. Second, we study using DGCNN to enable high-level semantic reasoning tasks such as shape segmentation and 3D object detection. To that end, we propose a multi-view based object detection model that learns complementary features by projecting point clouds to virtual views. In addition, our follow-up work Object DGCNN leverages DGCNN to model object relations and empowers a post-processing free object detection pipeline with state-of-the-art performances on multiple benchmarks. Third, we generalize these point cloud models to tackle low-level motion estimation problems such as point cloud registration. The proposed Deep Closest Point architecture combines a traditional optimization pipeline with deep learning. Moreover, Partial Registration Network (PRNet) uses shape registration as a proxy task to enable self-supervised learning from point clouds in a subsequent work. Finally, this thesis allows for a critical application -- scene understanding for autonomous driving. These studies collectively facilitate 3D deep learning in a broad range of scenarios in visual computing.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleLearning 3D Representations from Data
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcidhttps://orcid.org/ 0000-0002-2751-8200
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record