Toward visual understanding of everyday object
Author(s)Lim, Joseph J. (Joseph Jaewhan)
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
The computer vision community has made impressive progress on object recognition using large scale data. However, for any visual system to interact with objects, it needs to understand much more than simply recognizing where the objects are. The goal of my research is to explore and solve object understanding tasks for interaction - finding an object's pose in 3D, understanding its various states and transformations, and interpreting its physical interactions. In this thesis, I will focus on two specific aspects of this agenda: 3D object pose estimation and object state understanding. Precise pose estimation is a challenging problem. One reason is that an object's appearance inside an image can vary a lot based on different conditions (e.g. location, occlusions, and lighting). I address these issues by utilizing 3D models directly. The goal is to develop a method that can exploit all possible views provided by a 3D model - a single 3D model represents infinitely many 2D views of the same object. I have developed a method that uses the 3D geometry of an object for pose estimation. The method can then also learn additional real-world statistics, such as which poses appear more frequently, which area is more likely to contain an object, and which parts are commonly occluded and discriminative. These methods allow us to localize and estimate the exact pose of objects in natural images. Finally, I will also describe the work on learning and inferring different states and transformations an object class can undergo. Objects in visual scenes come in a rich variety of transformed states. A few classes of transformation have been heavily studied in computer vision: mostly simple, parametric changes in color and geometry. However, transformations in the physical world occur in many more flavors, and they come with semantic meaning: e.g., bending, folding, aging, etc. Hence, the goal is to learn about an object class, in terms of their states and transformations, using the collection of images from the image search engine.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 83-92).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.