An Invisible Issue of Task Underspecification in Deep Reinforcement Learning Evaluations

Jayawardana, Vindula Muthushan

dc.contributor.advisor	Wu, Cathy
dc.contributor.author	Jayawardana, Vindula Muthushan
dc.date.accessioned	2023-01-19T19:54:04Z
dc.date.available	2023-01-19T19:54:04Z
dc.date.issued	2022-09
dc.date.submitted	2022-10-19T18:57:20.427Z
dc.identifier.uri	https://hdl.handle.net/1721.1/147493
dc.description.abstract	Performance evaluations of Deep Reinforcement Learning (DRL) algorithms are an integral part of the scientific progress of the field. However, standard performance evaluation practices in evaluating algorithmic generalization of DRL methods within a task can be unreliable and misleading if not careful. An important source of possible error lies in the reliance of the reported outcomes on often arbitrarily selected point Markov decision processes (point MDPs), stemming from task underspecification. A large class of DRL tasks, particularly in real-world decision problems, induce a family of MDPs, which---perhaps confusingly---each has the same high-level problem definition. As a demonstrative example, consider the classic pendulum control task that could be represented by a family of possible MDPs, each with a different pendulum mass, but is typically represented as a single MDP. This thesis argues that for reliable downstream decision-making, performance evaluations on a task in DRL should be carried out over a family of MDPs rather than a point MDP, which may be subject to bias. This thesis first illustrates the pitfalls of point MDP based evaluations through benchmark DRL control tasks and a real-world case study in traffic signal control. Then, significant inconsistencies between conclusions derived from point MDP based evaluations and MDP family based evaluations are presented. Subsequently, to overcome the prohibitive cost of training DRL models on entire families of MDPs, a series of recommendations is provided to perform accurate yet efficient performance evaluations under a computational budget. This work contributes to bolstering the empirical rigor of reinforcement learning, especially as the outcomes of DRL trickle into downstream decision-making in real-world contexts.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	An Invisible Issue of Task Underspecification in Deep Reinforcement Learning Evaluations
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	0000-0002-2377-3757
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Jayawardana-vindula-SM-EECS-20 ...
Size:: 7.341Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record