Show simple item record

dc.contributor.authorOberst, Michael
dc.contributor.authorSontag, David Alexander
dc.description.abstractWe introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating counterfactual trajectories in finite partially observable Markov Decision Processes (POMDPs). We see this as a useful procedure for off-policy "debugging" in high-risk settings (e.g., healthcare); by decomposing the expected difference in reward between the RL and observed policy into specific episodes, we can identify episodes where the counterfactual difference in reward is most dramatic. This in turn can be used to facilitate review of specific episodes by domain experts. We demonstrate the utility of this procedure with a synthetic environment of sepsis management.en_US
dc.publisherMLResearch Pressen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceProceedings of Machine Learning Researchen_US
dc.titleCounterfactual off-policy evaluation with gumbel-max structural causal modelsen_US
dc.identifier.citationOberst, Michael and David Sontag. "Counterfactual off-policy evaluation with gumbel-max structural causal models." Proceedings of the 36th International Conference on Machine Learning, June 2019, Long Beach, California, MLResearch Press, 2019.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.relation.journalProceedings of the 36th International Conference on Machine Learningen_US
dc.eprint.versionFinal published versionen_US
dspace.orderedauthorsOberst, M; Sontag, Den_US
mit.metadata.statusAuthority Work and Publication Information Needed

Files in this item


This item appears in the following Collection(s)

Show simple item record