Loss bounds for uncertain transition probabilities in Markov decision processes

Mastin, Andrew; Jaillet, Patrick

dc.contributor.author	Jaillet, Patrick
dc.contributor.author	Mastin, Dana Andrew
dc.date.accessioned	2014-05-09T14:23:59Z
dc.date.available	2014-05-09T14:23:59Z
dc.date.issued	2012-12
dc.identifier.isbn	978-1-4673-2066-5
dc.identifier.isbn	978-1-4673-2065-8
dc.identifier.isbn	978-1-4673-2063-4
dc.identifier.isbn	978-1-4673-2064-1
dc.identifier.uri	http://hdl.handle.net/1721.1/86896
dc.description.abstract	We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are precomputed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Given a bound on the total variation error of estimated transition probability distributions, we derive upper bounds on the loss of expected total reward. The approach analyzes the growth of errors incurred by stepping backwards in time while precomputing value functions, which requires bounding a multilinear program. Loss bounds are given for the finite horizon undiscounted, finite horizon discounted, and infinite horizon discounted cases, and a tight example is shown.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Grant 1029603)	en_US
dc.description.sponsorship	National Science Foundation (U.S.). Graduate Research Fellowship Program	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/CDC.2012.6426504	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Loss bounds for uncertain transition probabilities in Markov decision processes	en_US
dc.type	Article	en_US
dc.identifier.citation	Mastin, Andrew, and Patrick Jaillet. “Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes.” 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) (n.d.).	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.mitauthor	Mastin, Dana Andrew	en_US
dc.contributor.mitauthor	Jaillet, Patrick	en_US
dc.relation.journal	Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Mastin, Andrew; Jaillet, Patrick	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-8585-6566
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Jaillet_Loss bounds.pdf
Size:: 420.5Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record