Quickest change detection approach to optimal control in Markov decision processes with model changes

Banerjee, Taposh; Miao Liu, Taposh; How, Jonathan P.

dc.contributor.author	Banerjee, Taposh
dc.contributor.author	Liu, Miao
dc.contributor.author	How, Jonathan P
dc.date.accessioned	2018-04-13T20:12:50Z
dc.date.available	2018-04-13T20:12:50Z
dc.date.issued	2017-07
dc.date.submitted	2017-05
dc.identifier.isbn	978-1-5090-5992-8
dc.identifier.uri	http://hdl.handle.net/1721.1/114735
dc.description.abstract	Optimal control in non-stationary Markov decision processes (MDP) is a challenging problem. The aim in such a control problem is to maximize the long-term discounted reward when the transition dynamics or the reward function can change over time. When a prior knowledge of change statistics is available, the standard Bayesian approach to this problem is to reformulate it as a partially observable MDP (POMDP) and solve it using approximate POMDP solvers, which are typically computationally demanding. In this paper, the problem is analyzed through the viewpoint of quickest change detection (QCD), a set of tools for detecting a change in the distribution of a sequence of random variables. Current methods applying QCD to such problems only passively detect changes by following prescribed policies, without optimizing the choice of actions for long term performance. We demonstrate that ignoring the reward-detection trade-off can cause a significant loss in long term rewards, and propose a two threshold switching strategy to solve the issue. A non-Bayesian problem formulation is also proposed for scenarios where a Bayesian formulation cannot be defined. The performance of the proposed two threshold strategy is examined through numerical analysis on a non-stationary MDP task, and the strategy outperforms the state-of-the-art QCD methods in both Bayesian and non-Bayesian settings.	en_US
dc.description.sponsorship	Lincoln Laboratory	en_US
dc.description.sponsorship	Northrop Grumman Corporation	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.23919/ACC.2017.7962986	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Quickest change detection approach to optimal control in Markov decision processes with model changes	en_US
dc.type	Article	en_US
dc.identifier.citation	Banerjee, Taposh, Miao Liu, and Jonathan P. How. “Quickest Change Detection Approach to Optimal Control in Markov Decision Processes with Model Changes.” 2017 American Control Conference (ACC), May 2017, Seattle, WA, USA, 2017.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.contributor.mitauthor	Banerjee, Taposh
dc.contributor.mitauthor	Liu, Miao
dc.contributor.mitauthor	How, Jonathan P
dc.relation.journal	2017 American Control Conference (ACC)	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2018-03-21T16:35:37Z
dspace.orderedauthors	Banerjee, Taposh; Miao Liu, Taposh; How, Jonathan P.	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-1648-8325
dc.identifier.orcid	https://orcid.org/0000-0001-8576-1930
mit.license	OPEN_ACCESS_POLICY	en_US

Files in this item

Name:: 1609.06757.pdf
Size:: 313.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record