Strategizing against online learners in normal form repeated
games

Assos, Angelos

Author(s)

Assos, Angelos

DownloadThesis PDF (1.164Mb)

Advisor

Daskalakis, Constantinos

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

With the advent of machine learning and AI, learning algorithms are becoming more and more prevalent in online learning settings, where sequential decision-making is required. In such settings, the decisions of each agent can affect the utilities (or losses) of the other agents, as well as influence the decisions made by other agents later on in the interaction. Therefore, if an agent is good at anticipating the behavior of the other agents, in particular how they will make decisions in each round as a function of their experience thus far, he could try to judiciously make his own decisions over the rounds of the interaction so as to influence the other agents to behave in a way that ultimately benefits his own utility. In this thesis, we study repeated two-player games involving two agents: a learner, which employs an online learning algorithm to choose his strategy in each round; and an optimizer, which knows the learner’s utility function, parameters and the learner’s online learning algorithm. The optimizer wants to plan ahead to maximize his own utility while taking into account the learner’s behavior. We study this setting in zero-sum and general-sum games. In zero-sum games, we provide algorithms for the optimizer that can efficiently exploit a learner that employs a specific online learning algorithm in discrete and continuous-time dynamics. Specifically, the learner employs the Multiplicative Weights Update (MWU) algorithm for the discrete-time games, and the Replicator Dynamics in the continuous-time games. In general-sum games, we provide a negative result. Our negative result shows that, unless P=NP, there is no Fully Polynomial Time Approximation Scheme (FPTAS) for maximizing the utility of an optimizer against a learner that best responds to the history in each round. We additionally provide exponential-time algorithms that efficiently strategize against a learner that uses MWU, as well as a new way of thinking about strategizing against online learners via calculus of variations.

Date issued

2025-02

URI

https://hdl.handle.net/1721.1/159121

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses