Show simple item record

dc.contributor.advisorShah, Devavrat
dc.contributor.authorYang, Cindy X.
dc.date.accessioned2022-01-14T14:51:44Z
dc.date.available2022-01-14T14:51:44Z
dc.date.issued2021-06
dc.date.submitted2021-06-17T20:15:01.317Z
dc.identifier.urihttps://hdl.handle.net/1721.1/139130
dc.description.abstractOffline reinforcement learning, where a policy is learned from a fixed dataset of trajectories without further interaction with the environment, is one of the greatest challenges in reinforcement learning. Despite its compelling application to large, real-world datasets, existing RL benchmarks have struggled to perform well in the offline setting. In this thesis, we consider offline RL with heterogeneous agents (i.e. varying state dynamics) under severe data scarcity where only one historical trajectory per agent is observed. Under these conditions, we find that the performance of stateof-the-art offline and model-based RL methods degrade significantly. To tackle this problem, we present PerSim, a method to learn a personalized simulator for each agent leveraging historical data across all agents, prior to learning a policy. We achieve this by positing that the transition dynamics across agents are a latent function of latent factors associated with agents, actions, and units. Subsequently, we theoretically establish that this function is well-approximated by a “low-rank” decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data. In extensive experiments performed on RL methods and popular benchmark environments from OpenAI Gym and Mujoco, we show that PerSim consistently achieves improved performance, as measured by average reward and prediction error.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titlePerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Latent Factor Representation
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record