Problem-Independent Regrets on Expectation-Dependent Multi-Armed Bandits
Author(s)
Ai, Rui
DownloadThesis PDF (876.4Kb)
Advisor
Simchi-Levi, David
Terms of use
Metadata
Show full item recordAbstract
The independence axiom (IA) proposed by Von Neumann and Morgenstern [50] is the cornerstone of the expected utility theory. However, some empirical experiments show that the IA is often violated in the real world. We propose a new kind of multi-armed bandit problem where the expectation of outcomes may influence the agent’s utility which we call expectation-dependent multi-armed bandits and rationalize the choice of agents in Machina’s paradox lacking the IA. We design provably efficient algorithms with low minimax regrets and show their consistency of time horizon T with corresponding regret lower bounds, revealing statistical optimality. Furthermore, as we first consider bandits whose corresponding utility depends on both reality and expectation, it provides a bridge between machine learning and economic behavior theory, shedding light on how to interpret some counterintuitive economic scenarios, like bounded rationality explored by Zhang et al. [54].
Date issued
2025-05Department
Massachusetts Institute of Technology. Institute for Data, Systems, and SocietyPublisher
Massachusetts Institute of Technology