Learning to see the physical world

Wu, Jiajun,Ph.D.Massachusetts Institute of Technology.

dc.contributor.advisor	William T. Freeman and Joshua B. Tenenbaum.	en_US
dc.contributor.author	Wu, Jiajun,Ph.D.Massachusetts Institute of Technology.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2020-11-03T20:31:21Z
dc.date.available	2020-11-03T20:31:21Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/128332
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2020	en_US
dc.description	Cataloged from PDF of thesis.	en_US
dc.description	Includes bibliographical references (pages 271-303).	en_US
dc.description.abstract	Human intelligence is beyond pattern recognition. From a single image, we are able to explain what we see, reconstruct the scene in 3D, predict what's going to happen, and plan our actions accordingly. Artificial intelligence, in particular deep learning, still falls short in some preeminent aspects when compared with human intelligence, despite its phenomenal development in the past decade: they in general tackle specific problems, require large amounts of training data, and easily break when generalizing to new tasks or environments. In this dissertation, we study the problem of physical scene understanding-building versatile, data-efficient, and generalizable machines that learn to see, reason about, and interact with the physical world. The core idea is to exploit the generic, causal structure behind the world, including knowledge from computer graphics, physics, and language, in the form of approximate simulation engines, and to integrate them with deep learning.	en_US
dc.description.abstract	Here, learning plays a multifaceted role: models may learn to invert simulation engines for efficient inference; they may also learn to approximate or augment simulation engines for more powerful forward simulation. This dissertation consists of three parts, where we investigate the use of such a hybrid model for perception, dynamics modeling, and cognitive reasoning, respectively. In Part I, we use learning in conjunction with graphics engines to build an object-centered scene representation for object shape, pose, and texture. In Part II, in addition to graphics engines, we pair learning with physics engines to simultaneously infer physical object properties. We also explore learning approximate simulation engines for better flexibility and expressiveness. In Part III, we leverage and extend the models introduced in Parts I and II for concept discovery and cognitive reasoning by looping in a program execution engine.	en_US
dc.description.abstract	The enhanced models discover program-like structures in objects and scenes and, in turn, exploit them for downstream tasks such as visual question answering and scene manipulation.	en_US
dc.description.statementofresponsibility	by Jiajun Wu.	en_US
dc.format.extent	xviii, 303 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Learning to see the physical world	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1201541074	en_US
dc.description.collection	Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2020-11-03T20:31:18Z	en_US
mit.thesis.degree	Doctoral	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1201541074-MIT.pdf
Size:: 62.04Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record