dc.contributor.advisor | Leslie Kaelbling | |
dc.contributor.author | Marthi, Bhaskara | |
dc.contributor.other | Learning and Intelligent Systems | |
dc.date.accessioned | 2007-02-13T19:01:57Z | |
dc.date.available | 2007-02-13T19:01:57Z | |
dc.date.issued | 2007-02-13 | |
dc.identifier.other | MIT-CSAIL-TR-2007-010 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/35890 | |
dc.description.abstract | This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults. | |
dc.format.extent | 8 p. | |
dc.relation.ispartofseries | Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory | |
dc.title | Automatic shaping and decomposition of reward functions | |