Robust and Scalable Multiagent Reinforcement Learning in
Adversarial Scenarios

Shen, Macheng

Author(s)

Shen, Macheng

DownloadThesis PDF (6.246Mb)

Advisor

How, Jonathan P.

Leonard, John

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Multiagent decision-making is a ubiquitous problem with many real-world applications, such as autonomous driving, multi-player video games, and robot team sports. Key challenges of multiagent learning include the presence of uncertainty in the other agent’s behaviors and the curse of dimensionality caused by the high dimensionality of the joint observation, action, and policy space. These challenges are accentuated even further in adversarial scenarios due to the unknown agent intents and unexpected, possibly adversarial behaviors. This thesis presents approaches for robust and scalable multiagent learning with the goal of efficiently building autonomous agents that can operate robustly in adversarial scenarios. The capability of accurately inferring unknown agent intents by observing its behaviors is critical for robust decision-making. A challenge in this case is the high uncertainty in an adversary’s actual behavior, including potential deception, which could be significantly different from an a priori behavior model. Capturing the interaction between the ego-agent and the adversaries as well as the reasoning of available information to both agents is critical for modeling this deceptive behavior. This thesis addresses this intent recognition problem using a game-theoretic opponent modeling approach based on a new diversity-driven belief-space ensemble training technique that is used to achieve robustness against deception. To extend the ensemble approach to scenarios with multiple agents, this thesis presents a scalable multiagent learning technique that facilitates near-optimal joint policy learning through a sparse-attention mechanism. This mechanism results in focused parameter update, which significantly improves sample-efficiency. Moreover, this thesis also contributes a novel implicit ensemble training approach that leverages multi-task learning and deep generative policy distribution to achieve better robustness at a much lower computation and memory cost compared with previous ensemble techniques. The combination of robust intent recognition and scalable multiagent learning leads to robust and scalable offline policy learning. However, a fully autonomous agent also needs to be able to continually learn from (and adapt to) new environments and peer agents. Thus this thesis also presents to a safe adaptation approach that enables adaptation to a new opponent while maintaining low exploitability for any possible opponent exploitation in adversarial scenarios. The contributions presented in this thesis facilitate building autonomous agents that can make robust decisions under competitive multiagent scenarios with uncertainty and safely adapt to previously unseen peer agents, through computationally efficient learning.

Date issued

2022-05

URI

https://hdl.handle.net/1721.1/144626

Department

Massachusetts Institute of Technology. Department of Mechanical Engineering

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses