Stick-breaking policy learning in Dec-POMDPs
Author(s)
Amato, Christopher; Liao, Xuejun; Carin, Lawrence; Liu, Miao; How, Jonathan P
DownloadHow_Stick-breaking.pdf (588.8Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed usinga stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Date issued
2015-07Department
Massachusetts Institute of Technology. Department of Aeronautics and Astronautics; Massachusetts Institute of Technology. Laboratory for Information and Decision SystemsJournal
International Joint Conference on Artificial Intelligence
Publisher
International Joint Conferences on Artificial Intelligence, Inc.
Citation
Liu, Miao et al. "Stick-Breaking Policy Learning in Dec-POMDPs." International Joint Conference on Artificial Intelligence, July 25-31, 2015, Buenos Aires, Argentina.
Version: Author's final manuscript