Decision-Making Under Uncertainty: From Theory to Practice
Author(s)
Baek, Jackie
DownloadThesis PDF (1.289Mb)
Advisor
Farias, Vivek F.
Terms of use
Metadata
Show full item recordAbstract
The surge of data and technological advances over the past decade has immensely increased the use of algorithms to automate decisions for a plethora of problems. This thesis focuses on developing data-driven methodologies for sequential decision-making under uncertainty. Specifically, we develop solutions to address practical issues that can arise when operationalizing mathematical models, ranging from general methodologies to applications in healthcare and revenue management.
First, we study an issue of fairness that arises in online learning. In online learning, it is well-known that good strategies must explore; but exploration is associated with a cost, stemming from playing actions that are eventually revealed to be sub-optimal. We study how this cost of exploration is distributed amongst groups in a bandit setting. We leverage the theory of axiomatic bargaining, and the Nash bargaining solution in particular, to formalize what might constitute a fair division of the cost of exploration across groups. On the one hand, we show that any regret-optimal policy strikingly results in the least fair outcome: such policies will perversely leverage the most 'disadvantaged' groups when they can. More constructively, we derive policies that are optimally fair and simultaneously enjoy a small 'price of fairness'. We illustrate the relative merits of our algorithmic framework with a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.
Next, we study the classical problem of minimizing regret for multi-armed bandits. In this classic problem, there are several existing policies that are provably asymptotically optimal, but it is well-known that the empirical performance of these policies can vary greatly. We develop a new policy that we dub TS-UCB, which is a policy that combines ideas from two prominent policies for multi-armed bandits, Thompson sampling and upper confidence bound. We show that TS-UCB achieves materially lower regret on a comprehensive suite of synthetic and real-world datasets, and we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models.
Lastly, we study a decision-making problem in a revenue management setting. We study the network revenue management problem, an online allocation problem in which products are sold to a stream of arriving customers, where each product consumes a subset of capacity-constrained resources. We show that certain network structures can be exploited to improve both theoretical and empirical performance over existing, 'one-size-fits-all' approaches. Specifically, we study instances with a matroid sub-structure, which can be motivated by several classical supply chain constraints involving postponement and process flexibility. We prove that our policy improves over existing theoretical guarantees under this structure, and these results are empirically supported by numerical simulations.
Date issued
2022-09Department
Massachusetts Institute of Technology. Operations Research CenterPublisher
Massachusetts Institute of Technology