An Invisible Issue of Task Underspecification in Deep Reinforcement Learning Evaluations

Jayawardana, Vindula Muthushan

Author(s)

Jayawardana, Vindula Muthushan

DownloadThesis PDF (7.341Mb)

Advisor

Wu, Cathy

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Performance evaluations of Deep Reinforcement Learning (DRL) algorithms are an integral part of the scientific progress of the field. However, standard performance evaluation practices in evaluating algorithmic generalization of DRL methods within a task can be unreliable and misleading if not careful. An important source of possible error lies in the reliance of the reported outcomes on often arbitrarily selected point Markov decision processes (point MDPs), stemming from task underspecification. A large class of DRL tasks, particularly in real-world decision problems, induce a family of MDPs, which---perhaps confusingly---each has the same high-level problem definition. As a demonstrative example, consider the classic pendulum control task that could be represented by a family of possible MDPs, each with a different pendulum mass, but is typically represented as a single MDP. This thesis argues that for reliable downstream decision-making, performance evaluations on a task in DRL should be carried out over a family of MDPs rather than a point MDP, which may be subject to bias. This thesis first illustrates the pitfalls of point MDP based evaluations through benchmark DRL control tasks and a real-world case study in traffic signal control. Then, significant inconsistencies between conclusions derived from point MDP based evaluations and MDP family based evaluations are presented. Subsequently, to overcome the prohibitive cost of training DRL models on entire families of MDPs, a series of recommendations is provided to perform accurate yet efficient performance evaluations under a computational budget. This work contributes to bolstering the empirical rigor of reinforcement learning, especially as the outcomes of DRL trickle into downstream decision-making in real-world contexts.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/147493

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses