Advanced Search
DSpace@MIT

Practical reinforcement learning using representation learning and safe exploration for large scale Markov decision processes

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor Jonathan P. How and Nicholas Roy. en_US
dc.contributor.author Geramifard, Alborz, 1980- en_US
dc.contributor.other Massachusetts Institute of Technology. Dept. of Aeronautics and Astronautics. en_US
dc.date.accessioned 2012-07-02T15:42:35Z
dc.date.available 2012-07-02T15:42:35Z
dc.date.copyright 2012 en_US
dc.date.issued 2012 en_US
dc.identifier.uri http://hdl.handle.net/1721.1/71455
dc.description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2012. en_US
dc.description Cataloged from PDF version of thesis. en_US
dc.description Includes bibliographical references (p. 157-168). en_US
dc.description.abstract While creating intelligent agents who can solve stochastic sequential decision making problems through interacting with the environment is the promise of Reinforcement Learning (RL), scaling existing RL methods to realistic domains such as planning for multiple unmanned aerial vehicles (UAVs) has remained a challenge due to three main factors: 1) RL methods often require a plethora of data to find reasonable policies, 2) the agent has limited computation time between interactions, and 3) while exploration is necessary to avoid convergence to the local optima, in sensitive domains visiting all parts of the planning space may lead to catastrophic outcomes. To address the first two challenges, this thesis introduces incremental Feature Dependency Discovery (iFDD) as a representation expansion method with cheap per-timestep computational complexity that can be combined with any online, value-based reinforcement learning using binary features. In addition to convergence and computational complexity guarantees, when coupled with SARSA, iFDD achieves much faster learning (i.e., requires much less data samples) in planning domains including two multi-UAV mission planning scenarios with hundreds of millions of state-action pairs. In particular, in a UAV mission planning domain, iFDD performed more than 12 times better than the best competitor given the same number of samples. The third challenge is addressed through a constructive relationship between a planner and a learner in order to mitigate the learning risk while boosting the asymptotic performance and safety of an agent's behavior. The framework is an instance of the intelligent cooperative control architecture where a learner initially follows a safe policy generated by a planner. The learner incrementally improves this baseline policy through interaction, while avoiding behaviors believed to be risky. The new approach is demonstrated to be superior in two multi-UAV task assignment scenarios. For example in one case, the proposed method reduced the risk by 8%, while improving the performance of the planner up to 30%. en_US
dc.description.statementofresponsibility by Alborz Geramifard. en_US
dc.format.extent 168 p. en_US
dc.language.iso eng en_US
dc.publisher Massachusetts Institute of Technology en_US
dc.rights M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. en_US
dc.rights.uri http://dspace.mit.edu/handle/1721.1/7582 en_US
dc.subject Aeronautics and Astronautics. en_US
dc.title Practical reinforcement learning using representation learning and safe exploration for large scale Markov decision processes en_US
dc.type Thesis en_US
dc.description.degree Ph.D. en_US
dc.contributor.department Massachusetts Institute of Technology. Dept. of Aeronautics and Astronautics. en_US
dc.identifier.oclc 795174743 en_US


Files in this item

Name Size Format Description
795174743.pdf 19.61Mb PDF Preview, non-printable (open to all)
795174743-MIT.pdf 19.61Mb PDF Full printable version (MIT only)

This item appears in the following Collection(s)

Show simple item record

MIT-Mirage