Show simple item record

dc.contributor.advisorDimitri P. Bertsekas.en_US
dc.contributor.authorYu, Huizhen, Ph. D. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2007-01-10T15:36:44Z
dc.date.available2007-01-10T15:36:44Z
dc.date.copyright2006en_US
dc.date.issued2006en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/35299
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionIncludes bibliographical references (p. 165-169).en_US
dc.description.abstractWe consider approximation methods for discrete-time infinite-horizon partially observable Markov and semi-Markov decision processes (POMDP and POSMDP). One of the main contributions of this thesis is a lower cost approximation method for finite-space POMDPs with the average cost criterion, and its extensions to semi-Markov partially observable problems and constrained POMDP problems, as well as to problems with the undiscounted total cost criterion. Our method is an extension of several lower cost approximation schemes, proposed individually by various authors, for discounted POMDP problems. We introduce a unified framework for viewing all of these schemes together with some new ones. In particular, we establish that due to the special structure of hidden states in a POMDP, there is a class of approximating processes, which are either POMDPs or belief MDPs, that provide lower bounds to the optimal cost function of the original POMDP problem. Theoretically, POMDPs with the long-run average cost criterion are still not fully understood.en_US
dc.description.abstract(cont.) The major difficulties relate to the structure of the optimal solutions, such as conditions for a constant optimal cost function, the existence of solutions to the optimality equations, and the existence of optimal policies that are stationary and deterministic. Thus, our lower bound result is useful not only in providing a computational method, but also in characterizing the optimal solution. We show that regardless of these theoretical difficulties, lower bounds of the optimal liminf average cost function can be computed efficiently by solving modified problems using multichain MDP algorithms, and the approximating cost functions can be also used to obtain suboptimal stationary control policies. We prove the asymptotic convergence of the lower bounds under certain assumptions. For semi-Markov problems and total cost problems, we show that the same method can be applied for computing lower bounds of the optimal cost function. For constrained average cost POMDPs, we show that lower bounds of the constrained optimal cost function can be computed by solving finite-dimensional LPs. We also consider reinforcement learning methods for POMDPs and MDPs. We propose an actor-critic type policy gradient algorithm that uses a structured policy known as a finite-state controller.en_US
dc.description.abstract(cont.) We thus provide an alternative to the earlier actor-only algorithm GPOMDP. Our work also clarifies the relationship between the reinforcement learning methods for POMDPs and those for MDPs. For average cost MDPs, we provide a convergence and convergence rate analysis for a least squares temporal difference (TD) algorithm, called LSPE, and previously proposed for discounted problems. We use this algorithm in the critic portion of the policy gradient algorithm for POMDPs with finite-state controllers. Finally, we investigate the properties of the limsup and liminf average cost functions of various types of policies. We show various convexity and concavity properties of these costfunctions, and we give a new necessary condition for the optimal liminf average cost to be constant. Based on this condition, we prove the near-optimality of the class of finite-state controllers under the assumption of a constant optimal liminf average cost. This result provides a theoretical guarantee for the finite-state controller approach.en_US
dc.description.statementofresponsibilityby Huizhen Yu.en_US
dc.format.extent169 p.en_US
dc.format.extent940986 bytes
dc.format.extent950151 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleApproximate solution methods for partially observable Markov and semi-Markov decision processesen_US
dc.title.alternativeApproximate solution methods for POMDP and POSMDPen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc74906972en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record