Theoretical Foundations for Learning in Games and Dynamic Environments

Golowich, Noah

Author(s)

Golowich, Noah

DownloadThesis PDF (4.143Mb)

Advisor

Daskalakis, Constantinos

Moitra, Ankur

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Decision-making problems lie at the heart of numerous aspects of human and algorithmic behavior across our society, ranging from healthcare systems to financial systems to interactions with the physical world. A central challenge that arises across many decision-making problems is the presence of multiple agents, often with competing incentives. To understand how agents will act in such situations, it is often productive to compute equilibria, which have the property that no agent can deviate from them and improve their utility. An additional challenge is that decisions made by agents often change the state of the environment, which is modeled as dynamic. Thus, we need efficient algorithms for learning good policies, which tell the agent what to do as a function of the environment’s state. Extensive work spanning multiple domains such as economics, computer science, and statistics has been developed to model these decision-making problems. This has led to many celebrated results, which include, for instance, a considerable body of work studying the computational properties of Nash equilibria in normal-form games, and a long line of papers on reinforcement learning. However, many of these classical works suffer from a few shortcomings: first, they often do not account for the enormous state or action spaces available to agents in realistic decision-making settings, and second, many of them do not derive computationally efficient algorithms for the desired solution concepts. These shortcomings are brought to the forefront by the remarkable recent progress in artificial intelligence, which holds promise for solving decision-making problems with enormous state or action spaces but which is often bottlenecked by computation. The objective of this thesis is to develop theoretical foundations for the computational aspects of such decision-making problems: e.g., How do we efficiently compute equilibria in large games?, and: How can we efficiently learn near-optimal policies in complex environments? Some highlights of our results are listed below—first, we study problems in which there are multiple agents and the goal is to compute some notion of equilibrium: • We show the first near-optimal rate of convergence to equilibrium for a no-regret learning algorithm in normal-form games, resolving a decade-long line of work which had aimed to establish increasingly better rates. • We establish the first algorithm with sublinear swap regret against arbitrary adversaries enjoying only polylogarithmic dependence on the number of actions, resolving a question of Blum and Mansour from 2007. • As a corollary of the preceding result, we obtain the first polynomial-time algorithm for approximating a correlated equilibrium in extensive-form games (to constant approximation error), addressing a question of von Stengel & Forges from 2008. Additionally we obtain near-optimal bounds on the communication and query complexity of approximating correlated equilibria in normal-form games (to constant approximation error), addressing several open problems in the literature. • We give the first algorithm for the sequential calibration problem with calibration error beating that of the seminal work of Foster & Vohra from 1998. Moving on to decision-making problems where the environment is modeled as dynamic (typically studied in the framework of reinforcement learning (RL)), our results include the following: • We give the first end-to-end computationally efficient algorithms for learning a nearoptimal policy in many fundamental reinforcement learning problems, such as those of (constant-action) Linear Bellman Complete MDPs and sparse linear MDPs. • We give the first quasi-polynomial time algorithm for finding a near-optimal policy in a general and well-motivated class of partially observable RL environments, and show that our bound is tight. • We prove some (perhaps surprising) hardness results that arise in multi-agent RL problems. For instance, we show that it is computationally hard to implement noregret learning algorithms in multi-agent RL environments even when the agents can coordinate on their choice of algorithm, which creates a stark contrast with simpler multi-agent learning settings (e.g., in normal-form games) where no-regret learning has formed the bedrock for a wide array of developments over the last several decades. • Nevertheless, we show that by adjusting the type of equilibrium appropriately, we can circumvent the above hardness results and derive computationally efficient decentralized algorithms for computing equilibria in multi-agent RL environments. Many of the above results have inspired follow-up work which includes applications of our results to various problems in game theory, reinforcement learning, online learning, and related domains, as well as the formulation of new problems which are inspired by the above results.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/164054

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses