Show simple item record

dc.contributor.advisorJonathan P. How and Nicholas Roy.en_US
dc.contributor.authorGeramifard, Alborz, 1980-en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Aeronautics and Astronautics.en_US
dc.date.accessioned2012-07-02T15:42:35Z
dc.date.available2012-07-02T15:42:35Z
dc.date.copyright2012en_US
dc.date.issued2012en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/71455
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2012.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 157-168).en_US
dc.description.abstractWhile creating intelligent agents who can solve stochastic sequential decision making problems through interacting with the environment is the promise of Reinforcement Learning (RL), scaling existing RL methods to realistic domains such as planning for multiple unmanned aerial vehicles (UAVs) has remained a challenge due to three main factors: 1) RL methods often require a plethora of data to find reasonable policies, 2) the agent has limited computation time between interactions, and 3) while exploration is necessary to avoid convergence to the local optima, in sensitive domains visiting all parts of the planning space may lead to catastrophic outcomes. To address the first two challenges, this thesis introduces incremental Feature Dependency Discovery (iFDD) as a representation expansion method with cheap per-timestep computational complexity that can be combined with any online, value-based reinforcement learning using binary features. In addition to convergence and computational complexity guarantees, when coupled with SARSA, iFDD achieves much faster learning (i.e., requires much less data samples) in planning domains including two multi-UAV mission planning scenarios with hundreds of millions of state-action pairs. In particular, in a UAV mission planning domain, iFDD performed more than 12 times better than the best competitor given the same number of samples. The third challenge is addressed through a constructive relationship between a planner and a learner in order to mitigate the learning risk while boosting the asymptotic performance and safety of an agent's behavior. The framework is an instance of the intelligent cooperative control architecture where a learner initially follows a safe policy generated by a planner. The learner incrementally improves this baseline policy through interaction, while avoiding behaviors believed to be risky. The new approach is demonstrated to be superior in two multi-UAV task assignment scenarios. For example in one case, the proposed method reduced the risk by 8%, while improving the performance of the planner up to 30%.en_US
dc.description.statementofresponsibilityby Alborz Geramifard.en_US
dc.format.extent168 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectAeronautics and Astronautics.en_US
dc.titlePractical reinforcement learning using representation learning and safe exploration for large scale Markov decision processesen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Aeronautics and Astronautics
dc.identifier.oclc795174743en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record