Representation Discovery for Kernel-Based Reinforcement Learning

Zewdie, Dawit H.; Konidaris, George

The system will be going down for regular maintenance. Please save your work and logout.

Author(s)

Zewdie, Dawit H.; Konidaris, George

DownloadMIT-CSAIL-TR-2015-032.pdf (1.869Mb)

Other Contributors

Learning and Intelligent Systems

Advisor

Leslie Kaelbling

Terms of use

Creative Commons Attribution-ShareAlike 4.0 International http://creativecommons.org/licenses/by-sa/4.0/

Metadata

Show full item record

Abstract

Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest satisfy this requirement in their natural representation. In this paper we define Value-Consistent Pseudometric (VCPM), the distance function corresponding to a transformation of the domain into a space where the target function is maximally smooth and thus well-approximated by kernel regression. We then present DKBRL, an iterative batch RL algorithm interleaving steps of Kernel-Based Reinforcement Learning and distance metric adjustment. We evaluate its performance on Acrobot and PinBall, continuous-space reinforcement learning domains with discontinuous value functions.

Date issued

2015-11-24

URI

http://hdl.handle.net/1721.1/100053

Series/Report no.

MIT-CSAIL-TR-2015-032

Keywords

Metric learning

Collections

CSAIL Technical Reports (July 1, 2003 - present)