Learning to teach and meta-learning for sample-efficient multiagent reinforcement learning

Kim, Dong Ki(Aeronautics and astronautics scientist)Massachusetts Institute of Technology.

dc.contributor.advisor	Jonathan P. How.	en_US
dc.contributor.author	Kim, Dong Ki(Aeronautics and astronautics scientist)Massachusetts Institute of Technology.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics.	en_US
dc.date.accessioned	2020-11-03T20:29:57Z
dc.date.available	2020-11-03T20:29:57Z
dc.date.copyright	2020	en_US
dc.date.issued	2020	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/128312
dc.description	Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2020	en_US
dc.description	Cataloged from PDF of thesis.	en_US
dc.description	Includes bibliographical references (pages 89-97).	en_US
dc.description.abstract	Learning optimal policies in the presence of non-stationary policies of other simultaneously learning agents is a major challenge in multiagent reinforcement learning (MARL). The difficulty is further complicated by other challenges, including the multiagent credit assignment, the high dimensionality of the problems, and the lack of convergence guarantees. As a result, many experiences are often required to learn effective multiagent policies. This thesis introduces two frameworks to reduce the sample complexity in MARL. The first framework presented in this thesis provides a method to reduce the sample complexity by exchanging knowledge between agents. In particular, recent work on agents that learn to teach other teammates has demonstrated that action advising accelerates team-wide learning.	en_US
dc.description.abstract	However, the prior work simplified the learning of advising policies by using simple function approximations and only considering advising with primitive (low-level) actions, both of which limit the scalability of learning and teaching to more complex domains. This thesis introduces a novel learning-to-teach framework, called hierarchical multiagent teaching (HMAT), that improves scalability to complex environments by using a deep representation for student policies and by advising with more expressive extended-action sequences over multiple levels of temporal abstraction. Our empirical evaluations demonstrate that HMAT improves team-wide learning progress in large, complex domains where previous approaches fail. HMAT also learns teaching policies that can effectively transfer knowledge to different teammates with knowledge of different tasks, even when the teammates have heterogeneous action spaces.	en_US
dc.description.abstract	The second framework introduces the first policy gradient theorem based on meta-learning, which enables fast adaptation (i.e., need only a few iterations) with respect to the non-stationary fellow agents in MARL. The policy gradient theorem that we prove inherently includes both a self-shaping term that considers the impact of a meta-agent's initial policy on its adapted policy and an opponent-shaping term that exploits the learning dynamics of the other agents. We demonstrate that our meta-policy gradient provides agents to meta-learn about different sources of non-stationarity in the environment to improve their learning performances.	en_US
dc.description.statementofresponsibility	by Dong Ki Kim.	en_US
dc.format.extent	97 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Aeronautics and Astronautics.	en_US
dc.title	Learning to teach and meta-learning for sample-efficient multiagent reinforcement learning	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics	en_US
dc.identifier.oclc	1201259574	en_US
dc.description.collection	S.M. Massachusetts Institute of Technology, Department of Aeronautics and Astronautics	en_US
dspace.imported	2020-11-03T20:29:56Z	en_US
mit.thesis.degree	Master	en_US
mit.thesis.department	Aero	en_US

Files in this item

Name:: 1201259574-MIT.pdf
Size:: 10.11Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record