Exploration in Gradient-Based Reinforcement Learning
Author(s)
Meuleau, Nicolas; Peshkin, Leonid; Kim, Kee-Eung
DownloadAIM-2001-003.ps (5.334Mb)
Additional downloads
Metadata
Show full item recordAbstract
Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements.
Date issued
2001-04-03Other identifiers
AIM-2001-003
Series/Report no.
AIM-2001-003