dc.contributor.advisor | Hadfield-Menell, Dylan | |
dc.contributor.author | Mayer, Hendrik T. | |
dc.date.accessioned | 2024-09-16T13:47:04Z | |
dc.date.available | 2024-09-16T13:47:04Z | |
dc.date.issued | 2024-05 | |
dc.date.submitted | 2024-07-11T14:36:54.185Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/156753 | |
dc.description.abstract | Reinforcement Learning (RL) agents optimize reward functions to learn desirable policies in a variety of important real-world applications such as self-driving cars and recommender systems. However, in practice, it can be very difficult to specify the correct reward function for a complex problem, in what is known as reward misspecifcation. Impact measures provide metrics to determine how robust a particular agent’s behavior is to reward misspecification. This thesis analyzes one particular impact measure: the frequency of irreversible actions that an agent takes. We study this impact measure using a time-varying model of the principal’s preferences. This choice was motivated by two primary considerations. First, many real-world scenarios consist of a principal with time-varying preferences. Second, an agent assuming time-varying preferences may be more averse to performing irreversible actions. In this thesis, we examine principal-agent (human-robot) assistance games in toy grid environments inspired by cooperative inverse reinforcement learning [1], where irreversible actions correspond to removing transitions from a POMDP. In these games, we focus on how the frequency of changes in the principal’s preferences and the optimality of the principal influence the agent’s willingness to take irreversible actions. In 2-node and 4-node assistance games, we find two main results. First, in the presence of a random or approximately optimal human, the robot performs more irreversible actions as the goal state changes position more often. Second, in the presence of an optimal human, the robot rarely performs irreversible actions. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | Irreversible Actions in Assistance Games with a Dynamic Goal | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |