MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Experimentation and Control in Online Platforms

Author(s)
Zheng, Andrew
Thumbnail
DownloadThesis PDF (2.487Mb)
Advisor
Farias, Vivek F.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Decision-making in many online platforms is naturally modeled as control of a large-scale dynamical system. In particular, these are typically offline control problems: the platform collects fine-grained offline datasets, either via an experiment or logging of some incumbent policy, and hopes to use this data to evaluate new control policies or improve existing ones. This thesis explores the statistical challenges involved in learning about policies in such an environment, where sample efficiency is paramount. One ubiquitous problem is that of experimentation under "Markovian" interference, where interventions on some experimental units impact other units through modifications to the shared system state (such as a limited inventory). The best existing estimators for this problem are largely heuristic in nature. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator, the Differences-In-Q’s (DQ) estimator, which achieves a striking bias-variance tradeoff: DQ can have exponentially smaller variance than off-policy evaluation, while incurring bias that is only second order in the impact of the intervention. In the process, we introduce new techniques for achieving practical bias-variance trade-offs in off-policy evaluation more generally. Chief among DQ’s advantages is its effectiveness in practice. Over the course of a six-month engagement, we implemented DQ on Douyin’s internal experimentation platforms. In the process, we demonstrated that DQ dominates state-of-the-art alternatives, and adapts readily to a variety of practical experimental settings and concerns. When more sophisticated experimental designs are available, a common alternative is to choose units of experimentation that are sufficiently coarse so as to eliminate interference. ‘Region-split’ experiments on online platforms, where an intervention is applied to a single region over some experimental horizon, are one example of such a setting. Synthetic control is the state-of-the-art approach to inference in such experiments. The cost of these experiments is high since the opportunity cost of a sub-optimal intervention is borne by an entire region over the length of the experiment. More seriously, correct inference requires assumptions limiting the ‘non-stationarity’ of test and control units, which we demonstrate to fail in practice. So motivated, we propose a new adaptive approach to experimentation, dubbed Synthetically Controlled Thompson Sampling (SCTS), which robustly identifies the optimal treatment without the non-stationarity assumptions of the status quo, and minimizes the cost of experimentation by incurring near-optimal, square-root regret.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151474
Department
Massachusetts Institute of Technology. Operations Research Center
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.