dc.contributor.author | Amato, Christopher | |
dc.contributor.author | Liu, Miao | |
dc.contributor.author | Sivakumar, Kavinayan P | |
dc.contributor.author | Omidshafiei, Shayegan | |
dc.contributor.author | How, Jonathan P | |
dc.date.accessioned | 2018-04-13T22:28:08Z | |
dc.date.available | 2018-04-13T22:28:08Z | |
dc.date.issued | 2017-12 | |
dc.date.submitted | 2017-09 | |
dc.identifier.isbn | 978-1-5386-2682-5 | |
dc.identifier.isbn | 978-1-5386-2681-8 | |
dc.identifier.isbn | 978-1-5386-2683-2 | |
dc.identifier.issn | 2153-0866 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/114739 | |
dc.description.abstract | This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment. | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/IROS.2017.8206001 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | arXiv | en_US |
dc.title | Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Liu, Miao, Kavinayan Sivakumar, Shayegan Omidshafiei, Christopher Amato, and Jonathan P. How. “Learning for Multi-Robot Cooperation in Partially Observable Stochastic Environments with Macro-Actions.” 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017, Vancouver, BC, Canada, 2017. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Aeronautics and Astronautics | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Mechanical Engineering | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems | en_US |
dc.contributor.mitauthor | Liu, Miao | |
dc.contributor.mitauthor | Sivakumar, Kavinayan P | |
dc.contributor.mitauthor | Omidshafiei, Shayegan | |
dc.contributor.mitauthor | How, Jonathan P | |
dc.relation.journal | 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) | en_US |
dc.eprint.version | Original manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2018-03-21T16:14:11Z | |
dspace.orderedauthors | Liu, Miao; Sivakumar, Kavinayan; Omidshafiei, Shayegan; Amato, Christopher; How, Jonathan P. | en_US |
dspace.embargo.terms | N | en_US |
dc.identifier.orcid | https://orcid.org/0000-0002-1648-8325 | |
dc.identifier.orcid | https://orcid.org/0000-0003-0903-0137 | |
dc.identifier.orcid | https://orcid.org/0000-0001-8576-1930 | |
mit.license | OPEN_ACCESS_POLICY | en_US |