Stick-breaking policy learning in Dec-POMDPs

Liu, Miao; Amato, Christopher; Liao, Xuejun; Carin, Lawrence; How, Jonathan P.

dc.contributor.author	Amato, Christopher
dc.contributor.author	Liao, Xuejun
dc.contributor.author	Carin, Lawrence
dc.contributor.author	Liu, Miao
dc.contributor.author	How, Jonathan P
dc.date.accessioned	2016-10-21T19:07:30Z
dc.date.available	2016-10-21T19:07:30Z
dc.date.issued	2015-07
dc.identifier.uri	http://hdl.handle.net/1721.1/104918
dc.description.abstract	Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed usinga stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the DecPOMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.	en_US
dc.description.sponsorship	United States. Office of Naval Research. Multidisciplinary University Research Initiative (Award N000141110688)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Award 1463945)	en_US
dc.language.iso	en_US
dc.publisher	International Joint Conferences on Artificial Intelligence, Inc.	en_US
dc.relation.isversionof	http://ijcai-15.org/index.php/accepted-papers	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Stick-breaking policy learning in Dec-POMDPs	en_US
dc.type	Article	en_US
dc.identifier.citation	Liu, Miao et al. "Stick-Breaking Policy Learning in Dec-POMDPs." International Joint Conference on Artificial Intelligence, July 25-31, 2015, Buenos Aires, Argentina.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.mitauthor	Liu, Miao
dc.contributor.mitauthor	How, Jonathan P
dc.relation.journal	International Joint Conference on Artificial Intelligence	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Liu, Miao; Amato, Christopher; Liao, Xuejun; Carin, Lawrence; How, Jonathan P.	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-1648-8325
dc.identifier.orcid	https://orcid.org/0000-0001-8576-1930
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: How_Stick-breaking.pdf
Size:: 588.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record