Show simple item record

dc.contributor.advisorDaskalakis, Constantinos
dc.contributor.authorAssos, Angelos
dc.date.accessioned2025-04-14T14:06:51Z
dc.date.available2025-04-14T14:06:51Z
dc.date.issued2025-02
dc.date.submitted2025-04-03T14:06:09.788Z
dc.identifier.urihttps://hdl.handle.net/1721.1/159121
dc.description.abstractWith the advent of machine learning and AI, learning algorithms are becoming more and more prevalent in online learning settings, where sequential decision-making is required. In such settings, the decisions of each agent can affect the utilities (or losses) of the other agents, as well as influence the decisions made by other agents later on in the interaction. Therefore, if an agent is good at anticipating the behavior of the other agents, in particular how they will make decisions in each round as a function of their experience thus far, he could try to judiciously make his own decisions over the rounds of the interaction so as to influence the other agents to behave in a way that ultimately benefits his own utility. In this thesis, we study repeated two-player games involving two agents: a learner, which employs an online learning algorithm to choose his strategy in each round; and an optimizer, which knows the learner’s utility function, parameters and the learner’s online learning algorithm. The optimizer wants to plan ahead to maximize his own utility while taking into account the learner’s behavior. We study this setting in zero-sum and general-sum games. In zero-sum games, we provide algorithms for the optimizer that can efficiently exploit a learner that employs a specific online learning algorithm in discrete and continuous-time dynamics. Specifically, the learner employs the Multiplicative Weights Update (MWU) algorithm for the discrete-time games, and the Replicator Dynamics in the continuous-time games. In general-sum games, we provide a negative result. Our negative result shows that, unless P=NP, there is no Fully Polynomial Time Approximation Scheme (FPTAS) for maximizing the utility of an optimizer against a learner that best responds to the history in each round. We additionally provide exponential-time algorithms that efficiently strategize against a learner that uses MWU, as well as a new way of thinking about strategizing against online learners via calculus of variations.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleStrategizing against online learners in normal form repeated games
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record