Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping

Gehring, Clement

dc.contributor.advisor	Kaelbling, Leslie Pack
dc.contributor.advisor	Lozano-Pérez, Tomás
dc.contributor.author	Gehring, Clement
dc.date.accessioned	2022-08-29T15:56:03Z
dc.date.available	2022-08-29T15:56:03Z
dc.date.issued	2022-05
dc.date.submitted	2022-06-21T19:15:50.565Z
dc.identifier.uri	https://hdl.handle.net/1721.1/144562
dc.description.abstract	Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which are shared by interesting classes of decision making which can be leveraged to design learning algorithms that are both computationally and data efficient. Specifically, this work examines the low-rank structure found in various aspects of decision making problems and the sparsity of effects of classical deterministic planning, as well as the properties that end-to-end model-based methods depend on to perform well. We start by showing how low-rank structure in the successor representation enables the design of an efficient on-line learning algorithm. Similarly, we show how this same structure can be found in the Bellman operator which we use to formulate an efficient variant of the least-squares temporal difference learning algorithm. We further explore low-rank structure in state features to learn efficient transition models which allow for efficient planning entirely in a low dimensional space. We then take a closer look at end-to-end model-based methods in to better understand their properties. We do this by examining this type of approach through the lens of constrained optimization and implicit differentiation. Through the implicit perspective, we derive properties of these methods which allow us to identify conditions under which they perform well. We conclude this thesis by exploring how the sparsity of effects of classical planning problems can used to define general domain-independent heuristics which we can be used to greatly accelerate learning of domain-dependent heuristics through the use of potential-based reward shaping and lifted function approximation.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Efficient reinforcement learning via singular value decomposition, end-to-end model-based methods and reward shaping
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: Gehring-gehring-PhD-EECS-2022- ...
Size:: 5.585Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record