Bounds for Markov Decision Processes

Desai, Vijay V.; Farias, Vivek F.; Moallemi, Ciamac C.

dc.contributor.author	Desai, Vijay V.
dc.contributor.author	Farias, Vivek F.
dc.contributor.author	Moallemi, Ciamac C.
dc.date.accessioned	2019-02-21T15:11:11Z
dc.date.available	2019-02-21T15:11:11Z
dc.date.issued	2013
dc.identifier.isbn	9781118453988
dc.identifier.uri	http://hdl.handle.net/1721.1/120518
dc.description.abstract	We consider the problem of producing lower bounds on the optimal cost-to-go function of a Markov decision problem. We present two approaches to this problem: one based on the methodology of approximate linear programming (ALP) and another based on the so-called martingale duality approach. We show that these two approaches are intimately connected. Exploring this connection leads us to the problem of finding "optimal" martingale penalties within the martingale duality approach which we dub the pathwise optimization (PO) problem. We show interesting cases where the PO problem admits a tractable solution and establish that these solutions produce tighter approximations than the ALP approach. © 2013 The Institute of Electrical and Electronics Engineers, Inc.	en_US
dc.publisher	John Wiley & Sons, Inc.	en_US
dc.relation.isversionof	http://dx.doi.org/10.1002/9781118453988.ch20	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Non-MIT author website	en_US
dc.title	Bounds for Markov Decision Processes	en_US
dc.type	Article	en_US
dc.identifier.citation	Desai, Vijay V., Vivek F. Farias, and Ciamac C. Moallemi. “Bounds for Markov Decision Processes.” Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, edited by Frank L. Lewis and Derong Liu, John Wiley & Sons, Inc., 2013, pp. 452–473.	en_US
dc.contributor.department	Sloan School of Management	en_US
dc.contributor.mitauthor	Farias, Vivek F.
dc.relation.journal	Reinforcement Learning and Approximate Dynamic Programming for Feedback Control	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/BookItem	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-02-12T14:40:50Z
dspace.orderedauthors	Desai, Vijay V.; Farias, Vivek F.; Moallemi, Ciamac C.	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-5856-9246
mit.license	OPEN_ACCESS_POLICY	en_US

Files in this item

Name:: bounds-2011.pdf
Size:: 491.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record