Robustness and Adaptation via a Generative Model of Policies in Reinforcement Learning
Author(s)
Derek, Kenneth
DownloadThesis PDF (14.76Mb)
Advisor
Isola, Phillip
Terms of use
Metadata
Show full item recordAbstract
In the natural world, life has found an uncountable number of ways to survive and often thrive. Between and even within species, each individual has a slightly unique way of existing, and this diversity lends robustness to life in general. In this work, we aim to incentivize diversity of agent policies while optimizing for an external reward. To this end, we introduce a generative model of policies which maps a low-dimensional latent space to an agent policy space. In order to learn a broad range of solutions, our generative model uses a diversity regularizer that incentivizes different agent behaviors given the same state. Agents are assigned a specific latent vector persistent throughout their trajectory, and the generator learns to encode behavioral preferences in the latent space. Results show that our generator is able to find an array of policies that can express agent individuality through distinct and unique agent policies. Of particular interest, we find that having a diverse policy space allows us to rapidly adapt to unforeseen environmental ablations simply by optimizing generated policies in the low-dimensional latent space. We test this adaptability in an open-ended grid-world, as well as in a competitive, zero-sum, two-player soccer environment.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology