Experimentation and Control in Online Platforms
Author(s)
Zheng, Andrew
DownloadThesis PDF (2.487Mb)
Advisor
Farias, Vivek F.
Terms of use
Metadata
Show full item recordAbstract
Decision-making in many online platforms is naturally modeled as control of a large-scale dynamical system. In particular, these are typically offline control problems: the platform collects fine-grained offline datasets, either via an experiment or logging of some incumbent policy, and hopes to use this data to evaluate new control policies or improve existing ones. This thesis explores the statistical challenges involved in learning about policies in such an environment, where sample efficiency is paramount.
One ubiquitous problem is that of experimentation under "Markovian" interference, where interventions on some experimental units impact other units through modifications to the shared system state (such as a limited inventory). The best existing estimators for this problem are largely heuristic in nature. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator, the Differences-In-Q’s (DQ) estimator, which achieves a striking bias-variance tradeoff: DQ can have exponentially smaller variance than off-policy evaluation, while incurring bias that is only second order in the impact of the intervention. In the process, we introduce new techniques for achieving practical bias-variance trade-offs in off-policy evaluation more generally. Chief among DQ’s advantages is its effectiveness in practice. Over the course of a six-month engagement, we implemented DQ on Douyin’s internal experimentation platforms. In the process, we demonstrated that DQ dominates state-of-the-art alternatives, and adapts readily to a variety of practical experimental settings and concerns.
When more sophisticated experimental designs are available, a common alternative is to choose units of experimentation that are sufficiently coarse so as to eliminate interference. ‘Region-split’ experiments on online platforms, where an intervention is applied to a single region over some experimental horizon, are one example of such a setting. Synthetic control is the state-of-the-art approach to inference in such experiments. The cost of these experiments is high since the opportunity cost of a sub-optimal intervention is borne by an entire region over the length of the experiment. More seriously, correct inference requires assumptions limiting the ‘non-stationarity’ of test and control units, which we demonstrate to fail in practice. So motivated, we propose a new adaptive approach to experimentation, dubbed Synthetically Controlled Thompson Sampling (SCTS), which robustly identifies the optimal treatment without the non-stationarity assumptions of the status quo, and minimizes the cost of experimentation by incurring near-optimal, square-root regret.
Date issued
2023-06Department
Massachusetts Institute of Technology. Operations Research CenterPublisher
Massachusetts Institute of Technology