Strategizing against online learners in normal form repeated
games

Assos, Angelos

dc.contributor.advisor	Daskalakis, Constantinos
dc.contributor.author	Assos, Angelos
dc.date.accessioned	2025-04-14T14:06:51Z
dc.date.available	2025-04-14T14:06:51Z
dc.date.issued	2025-02
dc.date.submitted	2025-04-03T14:06:09.788Z
dc.identifier.uri	https://hdl.handle.net/1721.1/159121
dc.description.abstract	With the advent of machine learning and AI, learning algorithms are becoming more and more prevalent in online learning settings, where sequential decision-making is required. In such settings, the decisions of each agent can affect the utilities (or losses) of the other agents, as well as influence the decisions made by other agents later on in the interaction. Therefore, if an agent is good at anticipating the behavior of the other agents, in particular how they will make decisions in each round as a function of their experience thus far, he could try to judiciously make his own decisions over the rounds of the interaction so as to influence the other agents to behave in a way that ultimately benefits his own utility. In this thesis, we study repeated two-player games involving two agents: a learner, which employs an online learning algorithm to choose his strategy in each round; and an optimizer, which knows the learner’s utility function, parameters and the learner’s online learning algorithm. The optimizer wants to plan ahead to maximize his own utility while taking into account the learner’s behavior. We study this setting in zero-sum and general-sum games. In zero-sum games, we provide algorithms for the optimizer that can efficiently exploit a learner that employs a specific online learning algorithm in discrete and continuous-time dynamics. Specifically, the learner employs the Multiplicative Weights Update (MWU) algorithm for the discrete-time games, and the Replicator Dynamics in the continuous-time games. In general-sum games, we provide a negative result. Our negative result shows that, unless P=NP, there is no Fully Polynomial Time Approximation Scheme (FPTAS) for maximizing the utility of an optimizer against a learner that best responds to the history in each round. We additionally provide exponential-time algorithms that efficiently strategize against a learner that uses MWU, as well as a new way of thinking about strategizing against online learners via calculus of variations.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Strategizing against online learners in normal form repeated games
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: assos-assos-meng-eecs-2025-the ...
Size:: 1.164Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record

Strategizing against online learners in normal form repeated games

Files in this item

This item appears in the following Collection(s)