Inventory Balancing with Online Learning

Cheung, Wang Chi; Ma, Will; Simchi-Levi, David; Wang, Xinshang

dc.contributor.author	Cheung, Wang Chi
dc.contributor.author	Ma, Will
dc.contributor.author	Simchi-Levi, David
dc.contributor.author	Wang, Xinshang
dc.date.accessioned	2023-03-21T17:09:10Z
dc.date.available	2023-03-21T17:09:10Z
dc.date.issued	2022
dc.identifier.uri	https://hdl.handle.net/1721.1/148653
dc.description.abstract	<jats:p> We study a general problem of allocating limited resources to heterogeneous customers over time under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources and returns different rewards for the resources consumed. We consider a general model in which the resource consumption distribution associated with each customer type–action combination is not known but is consistent and can be learned over time. In addition, the sequence of customer types to arrive over time is arbitrary and completely unknown. We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which “reserves” a portion of each resource for high-reward customer types that could later arrive based on competitive ratio analysis, and online learning, which “explores” the resource consumption distributions for each customer type under different actions based on regret analysis. We define an auxiliary problem, which allows for existing competitive ratio and regret bounds to be seamlessly integrated. Furthermore, we propose a new variant of upper confidence bound (UCB), dubbed lazyUCB, which conducts less exploration in a bid to focus on “exploitation” in view of the resource scarcity. Finally, we construct an information-theoretic family of counterexamples to show that our integrated framework achieves the best possible performance guarantee. We demonstrate the efficacy of our algorithms on both synthetic instances generated for the online matching with stochastic rewards problem under unknown probabilities and a publicly available hotel data set. Our framework is highly practical in that it requires no historical data (no fitted customer choice models or forecasting of customer arrival patterns) and can be used to initialize allocation strategies in fast-changing environments. </jats:p><jats:p> This paper was accepted by J. George Shanthikumar, Management Science Special Section on Data-Driven Prescriptive Analytics. </jats:p>	en_US
dc.language.iso	en
dc.publisher	Institute for Operations Research and the Management Sciences (INFORMS)	en_US
dc.relation.isversionof	10.1287/MNSC.2021.4216	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	SSRN	en_US
dc.title	Inventory Balancing with Online Learning	en_US
dc.type	Article	en_US
dc.identifier.citation	Cheung, Wang Chi, Ma, Will, Simchi-Levi, David and Wang, Xinshang. 2022. "Inventory Balancing with Online Learning." Management Science, 68 (3).
dc.contributor.department	Massachusetts Institute of Technology. Department of Civil and Environmental Engineering	en_US
dc.relation.journal	Management Science	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2023-03-21T17:03:57Z
dspace.orderedauthors	Cheung, WC; Ma, W; Simchi-Levi, D; Wang, X	en_US
dspace.date.submission	2023-03-21T17:03:59Z
mit.journal.volume	68	en_US
mit.journal.issue	3	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: SSRN-id3236533.pdf
Size:: 1.439Mb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record