Robust and Scalable Multiagent Reinforcement Learning in Adversarial Scenarios
Author(s)
Shen, Macheng
DownloadThesis PDF (6.246Mb)
Advisor
How, Jonathan P.
Leonard, John
Terms of use
Metadata
Show full item recordAbstract
Multiagent decision-making is a ubiquitous problem with many real-world applications, such as autonomous driving, multi-player video games, and robot team sports. Key challenges of multiagent learning include the presence of uncertainty in the other agent’s behaviors and the curse of dimensionality caused by the high dimensionality of the joint observation, action, and policy space. These challenges are accentuated even further in adversarial scenarios due to the unknown agent intents and unexpected, possibly adversarial behaviors. This thesis presents approaches for robust and scalable multiagent learning with the goal of efficiently building autonomous agents that can operate robustly in adversarial scenarios. The capability of accurately inferring unknown agent intents by observing its behaviors is critical for robust decision-making. A challenge in this case is the high uncertainty in an adversary’s actual behavior, including potential deception, which could be significantly different from an a priori behavior model. Capturing the interaction between the ego-agent and the adversaries as well as the reasoning of available information to both agents is critical for modeling this deceptive behavior. This thesis addresses this intent recognition problem using a game-theoretic opponent modeling approach based on a new diversity-driven belief-space ensemble training technique that is used to achieve robustness against deception. To extend the ensemble approach to scenarios with multiple agents, this thesis presents a scalable multiagent learning technique that facilitates near-optimal joint policy learning through a sparse-attention mechanism. This mechanism results in focused parameter update, which significantly improves sample-efficiency. Moreover, this thesis also contributes a novel implicit ensemble training approach that leverages multi-task learning and deep generative policy distribution to achieve better robustness at a much lower computation and memory cost compared with previous ensemble techniques. The combination of robust intent recognition and scalable multiagent learning leads to robust and scalable offline policy learning. However, a fully autonomous agent also needs to be able to continually learn from (and adapt to) new environments and peer agents. Thus this thesis also presents to a safe adaptation approach that enables adaptation to a new opponent while maintaining low exploitability for any possible opponent exploitation in adversarial scenarios. The contributions presented in this thesis facilitate building autonomous agents that can make robust decisions under competitive multiagent scenarios with uncertainty and safely adapt to previously unseen peer agents, through computationally efficient learning.
Date issued
2022-05Department
Massachusetts Institute of Technology. Department of Mechanical EngineeringPublisher
Massachusetts Institute of Technology