Learning to Optimize Under Non-Stationarity

Cheung, Wang Chi; Simchi-Levi, David; Zhu, Ruihao

Author(s)

Cheung, Wang Chi; Simchi-Levi, David; Zhu, Ruihao

DownloadAccepted version (417.3Kb)

Open Access Policy

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

© 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d, BT, and T as the problem dimension, the variation budget, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal Oe(d2/3(BT + 1)1/3T2/3) dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best Oe(d2/3(BT + 1)1/4T3/4) dynamic regret.

Date issued

2018

URI

https://hdl.handle.net/1721.1/137064

Department

Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Statistics and Data Science Center (Massachusetts Institute of Technology)

Journal

AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics

Publisher

Elsevier BV

Citation

Cheung, Wang Chi, Simchi-Levi, David and Zhu, Ruihao. 2018. "Learning to Optimize Under Non-Stationarity." AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics, 89.

Version: Author's final manuscript

ISSN

1556-5068

Collections

MIT Open Access Articles