Irreversible Actions in Assistance Games with a Dynamic Goal

Mayer, Hendrik T.

Author(s)

Mayer, Hendrik T.

DownloadThesis PDF (652.3Kb)

Advisor

Hadfield-Menell, Dylan

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Reinforcement Learning (RL) agents optimize reward functions to learn desirable policies in a variety of important real-world applications such as self-driving cars and recommender systems. However, in practice, it can be very difficult to specify the correct reward function for a complex problem, in what is known as reward misspecifcation. Impact measures provide metrics to determine how robust a particular agent’s behavior is to reward misspecification. This thesis analyzes one particular impact measure: the frequency of irreversible actions that an agent takes. We study this impact measure using a time-varying model of the principal’s preferences. This choice was motivated by two primary considerations. First, many real-world scenarios consist of a principal with time-varying preferences. Second, an agent assuming time-varying preferences may be more averse to performing irreversible actions. In this thesis, we examine principal-agent (human-robot) assistance games in toy grid environments inspired by cooperative inverse reinforcement learning [1], where irreversible actions correspond to removing transitions from a POMDP. In these games, we focus on how the frequency of changes in the principal’s preferences and the optimality of the principal influence the agent’s willingness to take irreversible actions. In 2-node and 4-node assistance games, we find two main results. First, in the presence of a random or approximately optimal human, the robot performs more irreversible actions as the goal state changes position more often. Second, in the presence of an optimal human, the robot rarely performs irreversible actions.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156753

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses