Show simple item record

dc.contributor.authorCheung, Wang Chi
dc.contributor.authorSimchi-Levi, David
dc.contributor.authorZhu, Ruihao
dc.date.accessioned2023-03-21T17:04:23Z
dc.date.available2023-03-21T17:04:23Z
dc.date.issued2022
dc.identifier.urihttps://hdl.handle.net/1721.1/148652
dc.description.abstract<jats:p> We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms. </jats:p><jats:p> This paper was accepted by George J. Shanthikumar, Management Science Special Section on Data-Driven Prescriptive Analytics. </jats:p>en_US
dc.language.isoen
dc.publisherInstitute for Operations Research and the Management Sciences (INFORMS)en_US
dc.relation.isversionof10.1287/MNSC.2021.4024en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceSSRNen_US
dc.titleHedging the Drift: Learning to Optimize Under Nonstationarityen_US
dc.typeArticleen_US
dc.identifier.citationCheung, Wang Chi, Simchi-Levi, David and Zhu, Ruihao. 2022. "Hedging the Drift: Learning to Optimize Under Nonstationarity." Management Science, 68 (3).
dc.contributor.departmentMassachusetts Institute of Technology. Department of Civil and Environmental Engineeringen_US
dc.relation.journalManagement Scienceen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2023-03-21T16:51:46Z
dspace.orderedauthorsCheung, WC; Simchi-Levi, D; Zhu, Ren_US
dspace.date.submission2023-03-21T16:51:47Z
mit.journal.volume68en_US
mit.journal.issue3en_US
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record