Show simple item record

dc.contributor.advisorHadfield-Menell, Dylan
dc.contributor.authorMayer, Hendrik T.
dc.date.accessioned2024-09-16T13:47:04Z
dc.date.available2024-09-16T13:47:04Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:36:54.185Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156753
dc.description.abstractReinforcement Learning (RL) agents optimize reward functions to learn desirable policies in a variety of important real-world applications such as self-driving cars and recommender systems. However, in practice, it can be very difficult to specify the correct reward function for a complex problem, in what is known as reward misspecifcation. Impact measures provide metrics to determine how robust a particular agent’s behavior is to reward misspecification. This thesis analyzes one particular impact measure: the frequency of irreversible actions that an agent takes. We study this impact measure using a time-varying model of the principal’s preferences. This choice was motivated by two primary considerations. First, many real-world scenarios consist of a principal with time-varying preferences. Second, an agent assuming time-varying preferences may be more averse to performing irreversible actions. In this thesis, we examine principal-agent (human-robot) assistance games in toy grid environments inspired by cooperative inverse reinforcement learning [1], where irreversible actions correspond to removing transitions from a POMDP. In these games, we focus on how the frequency of changes in the principal’s preferences and the optimality of the principal influence the agent’s willingness to take irreversible actions. In 2-node and 4-node assistance games, we find two main results. First, in the presence of a random or approximately optimal human, the robot performs more irreversible actions as the goal state changes position more often. Second, in the presence of an optimal human, the robot rarely performs irreversible actions.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleIrreversible Actions in Assistance Games with a Dynamic Goal
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record