Interpolated Experience Replay for Improved Sample Efficiency of Model-Free Deep Reinforcement Learning Algorithms
Author(s)
Sander, Ryan M.
DownloadThesis PDF (6.192Mb)
Advisor
Rus, Daniela L.
Karaman, Sertac
Terms of use
Metadata
Show full item recordAbstract
The human brain is remarkably sample efficient, capable of learning complex behaviors given limited experience [16]. This sample efficiency property is crucial for effectively training robust deep reinforcement learning agents on continuous control tasks - when limited experience is available, poor sample efficiency can yield sub-optimal and unstable policies. To improve sample efficiency in these tasks, we propose Neighborhood Mixup Experience Replay (NMER) and Bayesian Interpolated Experience Replay (BIER), modular replay buffers that interpolate transitions with their closest neighbors in normalized state-action space. NMER preserves a locally linear approximation of the transition manifold by only interpolating transitions with similar stateaction features. BIER expands upon NMER by predicting interpolated transitions queried by NMER using learned Gaussian Process Regression models defined over a transition’s neighborhood. These interpolated transitions, predicted via Bayesian linear smoothing, are then used to update the policy and value functions of deep reinforcement learning agents in a likelihood-weighted fashion. NMER and BIER achieve greater sample efficiency than other state-of-the-art replay buffers when evaluated on model-free, off-policy reinforcement learning algorithms and OpenAI Gym MuJoCo environments. This improved sample efficiency can enable agents to learn robust and generalizable policies on continuous control tasks in settings where data is limited, such as many real-world robotics tasks.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology