Show simple item record

dc.contributor.authorAhmed, Asrar
dc.contributor.authorVarakantham, Pradeep
dc.contributor.authorLowalekar, Meghna
dc.contributor.authorAdulyasak, Yossiri
dc.contributor.authorJaillet, Patrick
dc.date.accessioned2021-10-27T20:34:55Z
dc.date.available2021-10-27T20:34:55Z
dc.date.issued2017
dc.identifier.urihttps://hdl.handle.net/1721.1/136337
dc.description.abstract© 2017 AI Access Foundation. All rights reserved. Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.
dc.language.isoen
dc.publisherAI Access Foundation
dc.relation.isversionof10.1613/JAIR.5242
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.sourceMIT web domain
dc.titleSampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
dc.typeArticle
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journalJournal of Artificial Intelligence Research
dc.eprint.versionAuthor's final manuscript
dc.type.urihttp://purl.org/eprint/type/JournalArticle
eprint.statushttp://purl.org/eprint/status/PeerReviewed
dc.date.updated2019-05-31T18:14:02Z
dspace.orderedauthorsAhmed, A; Varakantham, P; Lowalekar, M; Adulyasak, Y; Jaillet, P
dspace.date.submission2019-05-31T18:14:03Z
mit.journal.volume59
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record