Counterfactual off-policy evaluation with gumbel-max structural causal models

Oberst, Michael; Sontag, David Alexander

dc.contributor.author	Oberst, Michael
dc.contributor.author	Sontag, David Alexander
dc.date.accessioned	2021-04-09T20:45:12Z
dc.date.available	2021-04-09T20:45:12Z
dc.date.issued	2019-06
dc.identifier.uri	https://hdl.handle.net/1721.1/130437
dc.description.abstract	We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating counterfactual trajectories in finite partially observable Markov Decision Processes (POMDPs). We see this as a useful procedure for off-policy "debugging" in high-risk settings (e.g., healthcare); by decomposing the expected difference in reward between the RL and observed policy into specific episodes, we can identify episodes where the counterfactual difference in reward is most dramatic. This in turn can be used to facilitate review of specific episodes by domain experts. We demonstrate the utility of this procedure with a synthetic environment of sepsis management.	en_US
dc.language.iso	en
dc.publisher	MLResearch Press	en_US
dc.relation.isversionof	http://proceedings.mlr.press/v97/	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Proceedings of Machine Learning Research	en_US
dc.title	Counterfactual off-policy evaluation with gumbel-max structural causal models	en_US
dc.type	Article	en_US
dc.identifier.citation	Oberst, Michael and David Sontag. "Counterfactual off-policy evaluation with gumbel-max structural causal models." Proceedings of the 36th International Conference on Machine Learning, June 2019, Long Beach, California, MLResearch Press, 2019.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.relation.journal	Proceedings of the 36th International Conference on Machine Learning	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-04-06T18:37:22Z
dspace.orderedauthors	Oberst, M; Sontag, D	en_US
dspace.date.submission	2021-04-06T18:37:23Z
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: oberst19a.pdf
Size:: 1.417Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record