Stick-breaking policy learning in Dec-POMDPs

Liu, Miao; Amato, Christopher; Liao, Xuejun; Carin, Lawrence; How, Jonathan P.

Author(s)

Amato, Christopher; Liao, Xuejun; Carin, Lawrence; Liu, Miao; How, Jonathan P

DownloadHow_Stick-breaking.pdf (588.8Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed usinga stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

Date issued

2015-07

URI

http://hdl.handle.net/1721.1/104918

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems

Journal

International Joint Conference on Artificial Intelligence

Publisher

International Joint Conferences on Artificial Intelligence, Inc.

Citation

Liu, Miao et al. "Stick-Breaking Policy Learning in Dec-POMDPs." International Joint Conference on Artificial Intelligence, July 25-31, 2015, Buenos Aires, Argentina.

Version: Author's final manuscript

Collections

MIT Open Access Articles

DSpace@MIT