dc.contributor.author | Desai, Vijay V. | |
dc.contributor.author | Farias, Vivek F. | |
dc.contributor.author | Moallemi, Ciamac C. | |
dc.date.accessioned | 2019-02-21T15:11:11Z | |
dc.date.available | 2019-02-21T15:11:11Z | |
dc.date.issued | 2013 | |
dc.identifier.isbn | 9781118453988 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/120518 | |
dc.description.abstract | We consider the problem of producing lower bounds on the optimal cost-to-go function of a Markov decision problem. We present two approaches to this problem: one based on the methodology of approximate linear programming (ALP) and another based on the so-called martingale duality approach. We show that these two approaches are intimately connected. Exploring this connection leads us to the problem of finding "optimal" martingale penalties within the martingale duality approach which we dub the pathwise optimization (PO) problem. We show interesting cases where the PO problem admits a tractable solution and establish that these solutions produce tighter approximations than the ALP approach. © 2013 The Institute of Electrical and Electronics Engineers, Inc. | en_US |
dc.publisher | John Wiley & Sons, Inc. | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1002/9781118453988.ch20 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | Non-MIT author website | en_US |
dc.title | Bounds for Markov Decision Processes | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Desai, Vijay V., Vivek F. Farias, and Ciamac C. Moallemi. “Bounds for Markov Decision Processes.” Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, edited by Frank L. Lewis and Derong Liu, John Wiley & Sons, Inc., 2013, pp. 452–473. | en_US |
dc.contributor.department | Sloan School of Management | en_US |
dc.contributor.mitauthor | Farias, Vivek F. | |
dc.relation.journal | Reinforcement Learning and Approximate Dynamic Programming for Feedback Control | en_US |
dc.eprint.version | Original manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/BookItem | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2019-02-12T14:40:50Z | |
dspace.orderedauthors | Desai, Vijay V.; Farias, Vivek F.; Moallemi, Ciamac C. | en_US |
dspace.embargo.terms | N | en_US |
dc.identifier.orcid | https://orcid.org/0000-0002-5856-9246 | |
mit.license | OPEN_ACCESS_POLICY | en_US |