Automatic shaping and decomposition of reward functions

dc.contributor.advisor	Leslie Kaelbling
dc.contributor.author	Marthi, Bhaskara
dc.contributor.other	Learning and Intelligent Systems
dc.date.accessioned	2007-02-13T19:01:57Z
dc.date.available	2007-02-13T19:01:57Z
dc.date.issued	2007-02-13
dc.identifier.other	MIT-CSAIL-TR-2007-010
dc.identifier.uri	http://hdl.handle.net/1721.1/35890
dc.description.abstract	This paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults.
dc.format.extent	8 p.
dc.relation.ispartofseries	Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
dc.title	Automatic shaping and decomposition of reward functions