Hedging the Drift: Learning to Optimize Under Nonstationarity

Cheung, Wang Chi; Simchi-Levi, David; Zhu, Ruihao

dc.contributor.author	Cheung, Wang Chi
dc.contributor.author	Simchi-Levi, David
dc.contributor.author	Zhu, Ruihao
dc.date.accessioned	2023-03-21T17:04:23Z
dc.date.available	2023-03-21T17:04:23Z
dc.date.issued	2022
dc.identifier.uri	https://hdl.handle.net/1721.1/148652
dc.description.abstract	<jats:p> We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms. </jats:p><jats:p> This paper was accepted by George J. Shanthikumar, Management Science Special Section on Data-Driven Prescriptive Analytics. </jats:p>	en_US
dc.language.iso	en
dc.publisher	Institute for Operations Research and the Management Sciences (INFORMS)	en_US
dc.relation.isversionof	10.1287/MNSC.2021.4024	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	SSRN	en_US
dc.title	Hedging the Drift: Learning to Optimize Under Nonstationarity	en_US
dc.type	Article	en_US
dc.identifier.citation	Cheung, Wang Chi, Simchi-Levi, David and Zhu, Ruihao. 2022. "Hedging the Drift: Learning to Optimize Under Nonstationarity." Management Science, 68 (3).
dc.contributor.department	Massachusetts Institute of Technology. Department of Civil and Environmental Engineering	en_US
dc.relation.journal	Management Science	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2023-03-21T16:51:46Z
dspace.orderedauthors	Cheung, WC; Simchi-Levi, D; Zhu, R	en_US
dspace.date.submission	2023-03-21T16:51:47Z
mit.journal.volume	68	en_US
mit.journal.issue	3	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: SSRN-id3261050.pdf
Size:: 1008.Kb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record