Basis Function Adaptation Methods for Cost Approximation in MDP
Author(s)Yu, Huizhen; Bertsekas, Dimitri P.
MetadataShow full item record
We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09
Institute of Electrical and Electronics Engineers
Huizhen Yu, and D.P. Bertsekas. “Basis function adaptation methods for cost approximation in MDP.” Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on. 2009. 74-81. ©2009 Institute of Electrical and Electronics Engineers.
Final published version
INSPEC Accession Number: 10647014