Loss bounds for uncertain transition probabilities in Markov decision processes

Mastin, Andrew; Jaillet, Patrick

Author(s)

Jaillet, Patrick; Mastin, Dana Andrew

DownloadJaillet_Loss bounds.pdf (420.5Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are precomputed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Given a bound on the total variation error of estimated transition probability distributions, we derive upper bounds on the loss of expected total reward. The approach analyzes the growth of errors incurred by stepping backwards in time while precomputing value functions, which requires bounding a multilinear program. Loss bounds are given for the finite horizon undiscounted, finite horizon discounted, and infinite horizon discounted cases, and a tight example is shown.

Date issued

2012-12

URI

http://hdl.handle.net/1721.1/86896

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems

Journal

Proceedings of the 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Citation

Mastin, Andrew, and Patrick Jaillet. “Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes.” 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) (n.d.).

Version: Author's final manuscript

ISBN

978-1-4673-2066-5

978-1-4673-2065-8

978-1-4673-2063-4

978-1-4673-2064-1

Collections

MIT Open Access Articles

DSpace@MIT