Learning Structured World Models From and For Physical Interactions

Li, Yunzhu

dc.contributor.advisor	Torralba, Antonio
dc.contributor.advisor	Tedrake, Russ
dc.contributor.author	Li, Yunzhu
dc.date.accessioned	2023-01-19T18:49:36Z
dc.date.available	2023-01-19T18:49:36Z
dc.date.issued	2022-09
dc.date.submitted	2022-10-19T19:08:56.090Z
dc.identifier.uri	https://hdl.handle.net/1721.1/147384
dc.description.abstract	Humans have a strong intuitive understanding of the physical world. We observe and interact with the environment through multiple sensory modalities and build a mental model that predicts how the world would change if we applied a specific action (i.e., intuitive physics). This dissertation presents my research that draws on insights from humans and develops model-based reinforcement learning (RL) agents. The agents learn from their interactions and build predictive models of the environment that generalize widely across a range of objects made with different materials. The core idea behind my research is to introduce novel representations and integrate structural priors into the learning systems to model the dynamics at different levels of abstraction. I will discuss how we can make structural inferences about the underlying environment. I will also show how such structures can make model-based planning algorithms more effective and help robots to accomplish complicated manipulation tasks (e.g., manipulating an object pile, pouring a cup of water, and shaping deformable foam into a target configuration). Beyond visual perception, touch also plays a vital role in humans to perform physical interactions. I will discuss how we bridge the sensing gap between humans and robots by building multi-modal sensing platforms with dense tactile sensors in various forms (e.g., gloves, socks, vests, and robot sleeves) and how they can lead to more structured and physically grounded models of the world. This dissertation consists of three parts. In Part I, we show how we can learn world models at different levels of abstractions and how the learned models allow model-based planning to accomplish challenging robotic manipulation tasks both in simulation and in the real world. Part II investigates the use of a learned structured world model for physical inference that infers the causal relationships between different components within the environment and performs state and parameter estimations. Part III goes beyond the previous two parts that only assume vision as input by considering touch as an additional sensory modality. I will discuss the novel tactile sensors we developed and how they can be used in understanding hand-object and human-environment physical interactions.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Learning Structured World Models From and For Physical Interactions
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	https://orcid.org/0000-0002-1111-2150
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: Li-liyunzhu-PhD-EECS-2022-thes ...
Size:: 52.84Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record