Show simple item record

dc.contributor.advisorWu, Cathy
dc.contributor.authorJayawardana, Vindula Muthushan
dc.date.accessioned2023-01-19T19:54:04Z
dc.date.available2023-01-19T19:54:04Z
dc.date.issued2022-09
dc.date.submitted2022-10-19T18:57:20.427Z
dc.identifier.urihttps://hdl.handle.net/1721.1/147493
dc.description.abstractPerformance evaluations of Deep Reinforcement Learning (DRL) algorithms are an integral part of the scientific progress of the field. However, standard performance evaluation practices in evaluating algorithmic generalization of DRL methods within a task can be unreliable and misleading if not careful. An important source of possible error lies in the reliance of the reported outcomes on often arbitrarily selected point Markov decision processes (point MDPs), stemming from task underspecification. A large class of DRL tasks, particularly in real-world decision problems, induce a family of MDPs, which---perhaps confusingly---each has the same high-level problem definition. As a demonstrative example, consider the classic pendulum control task that could be represented by a family of possible MDPs, each with a different pendulum mass, but is typically represented as a single MDP. This thesis argues that for reliable downstream decision-making, performance evaluations on a task in DRL should be carried out over a family of MDPs rather than a point MDP, which may be subject to bias. This thesis first illustrates the pitfalls of point MDP based evaluations through benchmark DRL control tasks and a real-world case study in traffic signal control. Then, significant inconsistencies between conclusions derived from point MDP based evaluations and MDP family based evaluations are presented. Subsequently, to overcome the prohibitive cost of training DRL models on entire families of MDPs, a series of recommendations is provided to perform accurate yet efficient performance evaluations under a computational budget. This work contributes to bolstering the empirical rigor of reinforcement learning, especially as the outcomes of DRL trickle into downstream decision-making in real-world contexts.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleAn Invisible Issue of Task Underspecification in Deep Reinforcement Learning Evaluations
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid0000-0002-2377-3757
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record