Interpolated Experience Replay for Improved Sample Efficiency of Model-Free Deep Reinforcement Learning Algorithms

Sander, Ryan M.

dc.contributor.advisor	Rus, Daniela L.
dc.contributor.advisor	Karaman, Sertac
dc.contributor.author	Sander, Ryan M.
dc.date.accessioned	2022-01-14T14:42:02Z
dc.date.available	2022-01-14T14:42:02Z
dc.date.issued	2021-06
dc.date.submitted	2021-06-17T20:14:14.251Z
dc.identifier.uri	https://hdl.handle.net/1721.1/138972
dc.description.abstract	The human brain is remarkably sample efficient, capable of learning complex behaviors given limited experience [16]. This sample efficiency property is crucial for effectively training robust deep reinforcement learning agents on continuous control tasks - when limited experience is available, poor sample efficiency can yield sub-optimal and unstable policies. To improve sample efficiency in these tasks, we propose Neighborhood Mixup Experience Replay (NMER) and Bayesian Interpolated Experience Replay (BIER), modular replay buffers that interpolate transitions with their closest neighbors in normalized state-action space. NMER preserves a locally linear approximation of the transition manifold by only interpolating transitions with similar stateaction features. BIER expands upon NMER by predicting interpolated transitions queried by NMER using learned Gaussian Process Regression models defined over a transition’s neighborhood. These interpolated transitions, predicted via Bayesian linear smoothing, are then used to update the policy and value functions of deep reinforcement learning agents in a likelihood-weighted fashion. NMER and BIER achieve greater sample efficiency than other state-of-the-art replay buffers when evaluated on model-free, off-policy reinforcement learning algorithms and OpenAI Gym MuJoCo environments. This improved sample efficiency can enable agents to learn robust and generalizable policies on continuous control tasks in settings where data is limited, such as many real-world robotics tasks.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Interpolated Experience Replay for Improved Sample Efficiency of Model-Free Deep Reinforcement Learning Algorithms
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Sander-rmsander-meng-eecs-2021 ...
Size:: 6.192Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record