Mitigating Social Dilemmas in Multi-Agent Reinforcement Learning with Formal Contracting

Christoffersen, Phillip Johannes Kerr

Author(s)

Christoffersen, Phillip Johannes Kerr

DownloadThesis PDF (1.021Mb)

Advisor

Hadfield-Menell, Dylan

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

As society deploys more and more sophisticated artificial intelligence (AI) agents, it will be increasingly necessary for such agents, while pursuing their own objectives, to coexist in common environments in the physical or digital worlds. This may pose a challenge if the agents’ objectives conflict with each other– in the worst case, this can prevent any given agent from being able to fulfill their own objectives (e.g. self driving cars in a traffic jam). Situations such as these are termed social dilemmas. In this thesis, it is demonstrated that providing RL agents with the software infrastructure to precommit to zero-sum incentive modifications 1. Induces maximal social welfare in theory; and 2. When implemented with deep multi-agent reinforcement learning (MARL), also avoids social dilemmas in practice. Specifically, a novel algorithmic framework is proposed, termed formal contracting, which is formalized, studied game-theoretically, and investigated empirically. In formal contracting, before engaging in a given shared environment, agents are given the opportunity negotiate a binding modification to all agents’ objective functions, in order to provide incentives for the optimal use of shared resources. Within this framework, at all subgame-perfect equilibria (SPE), agents will in fact maximize social welfare, that is, the sum of all agent objectives in the original environment. Moreover, studies in simple domains, such as the classic prisoner’s dilemma, and more complex ones such as dynamic simulations of pollution management, show that this algorithmic framework can be implemented in MARL, and does indeed lead to outcomes with superior welfare in social dilemmas. This thesis concludes with discussions of related work, limitations of the approach, and future work, particularly involving scaling this methodology to larger problem instances containing more agents than studied.

Date issued

2024-02

URI

https://hdl.handle.net/1721.1/153795

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses