Sampling-based algorithms for stochastic optimal control
Author(s)Huynh, Vu Anh
Massachusetts Institute of Technology. Department of Aeronautics and Astronautics.
MetadataShow full item record
Controlling dynamical systems in uncertain environments is fundamental and essential in several fields, ranging from robotics, healthcare to economics and finance. In these applications, the required tasks can be modeled as continuous-time, continuous-space stochastic optimal control problems. Moreover, risk management is an important requirement of such problems to guarantee safety during the execution of control policies. However, even in the simplest version, finding closed-form or exact algorithmic solutions for stochastic optimal control problems is comuputationally challenging. The main contribution of this thesis is the development of theoretical foundations, and provably-correct and efficient sampling-based algorithms to solve stochastic optimal control problems in the presence of complex risk constraints. In the first part of the thesis, we consider the mentioned problems without risk constraints. We propose a novel algorithm called the incremental Markov Decision Process (iMDP) to compute incrementally any-time control policies that approximate arbitrarily well an optimal policy in terms of the expected cost. The main idea is to generate a sequence of finite discretizations of the original problem through random sampling of the state space. At each iteration, the discretized problem is a Markov Decision Process that serves as am incrementally refined model of the original problem. We show that the iMDP algorithm guarantees asymptotic optimality while maintaining low computational and space complexity. In the second part of the thesis, we consider risk constraints that are expressed as either bounded trajectory performance or bounded probabilities of failure. For the former, we present the first extended iMDP algorithm to approximate arbitrarily well an optimal feedback policy of the constrained problem. For the latter, we present a martingale approach that diffuses a risk constraint into a martingale to construct time-consistent control policies. The martingale stands for the level of risk tolerance that is contingent on available information over time. By augmenting the system dynamics with the martingale, the original risk-constrained problem is transformed into a stochastic target problem. We present the second extended iMDP algorithm to approximate arbitrarily well an optimal feedback policy of the original problem by sampling in the augmented state space and computing proper boundary values for the reformulated problem. In both cases, sequences of policies returned from the extended algorithms are both probabilistically sound and asymptotically optimal. The effectiveness of these algorithms is demonstrated on robot motion planning and control problems in cluttered environments in the presence of process noise.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).
DepartmentMassachusetts Institute of Technology. Department of Aeronautics and Astronautics.
Massachusetts Institute of Technology
Aeronautics and Astronautics.