Show simple item record

dc.contributor.advisorAndreas, Jacob
dc.contributor.authorJacob, Athul Paul
dc.date.accessioned2025-03-12T16:54:53Z
dc.date.available2025-03-12T16:54:53Z
dc.date.issued2024-09
dc.date.submitted2025-03-04T18:31:27.999Z
dc.identifier.urihttps://hdl.handle.net/1721.1/158481
dc.description.abstractThis thesis addresses the challenge of developing strategic AI agents capable of effective decision-making and communication in human-centric multi-agent systems. While significant progress has been made in AI for strategic decision-making, creating agents that can seamlessly interact with humans in multi-agentic settings remains a challenge. This research explores the limitations of current approaches, such as self-play reinforcement learning (RL) and imitation learning (IL), and proposes novel methods to overcome these constraints. Modeling human-like communication and decision making is a crucial first step toward building effective strategic agents. The initial part of the thesis addresses this through two approaches. We start by developing a regret minimization algorithm for modeling actions of strong and human-like agents called piKL, which incorporates a cost term proportional to the KL divergence between a search policy and a humanimitation learned policy. This approach improves reward while keeping behavior close to a human-imitation learned policy, producing agents that predict human actions accurately while improving performance in the benchmark game of no-press Diplomacy. Then, we develop a procedure for modeling populations of agents that communicate with humans using natural language. Our sample-efficient multitask training scheme for latent language policies (LLPs) improves the reward obtained by these policies while preserving the semantics of language in a complex real-time strategy game. Building on these foundations, the second part of the thesis focuses on building strategic agents for human-centric multi-agent domains. The research introduces the DiL-piKL planning algorithm and its extension, RL-DiL-piKL, which regularize self-play reinforcement learning and search towards a human imitation-learned policy. These algorithms enable the training of Diplodocus, an agent achieving expert human-level performance in no-press Diplomacy. A significant milestone is reached with Cicero, the first AI agent to achieve human-level performance in full-press Diplomacy, integrating a language model (LM) with planning and reinforcement learning algorithms based on piKL. The final part of the thesis revisits language generation tasks, applying piKL to model pragmatic communication and improving LM truthfulness. It presents Regularized Conventions (ReCo), a model of pragmatic language understanding that outperforms existing best response and rational speech act models across several datasets. Furthermore, a novel approach to LM decoding is introduced, casting it as a regularized imperfect-information sequential signaling game. This results in the equilibrium-ranking algorithm, which consistently improves performance over existing language model decoding procedures.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleBuilding Strategic AI Agents for Human-centric Multi-agent Systems
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record