Online and offline learning in operations
Author(s)Wang, Li,Ph D.Massachusetts Institute of Technology.
Massachusetts Institute of Technology. Operations Research Center.
MetadataShow full item record
With the rapid advancement of information technology and accelerated development of data science, the importance of integrating data into decision-making has never been stronger. In this thesis, we propose data-driven algorithms to incorporate learning from data in three operations problems, concerning both online learning and offline learning settings. First, we study a single product pricing problem with demand censoring in an offline data-driven setting. In this problem, a retailer is given a finite level of inventory, and faces a random demand that is price sensitive in a linear fashion with unknown parameters and distribution. Any unsatisfied demand is lost and unobservable. The retailer's objective is to use offline censored demand data to find an optimal price, maximizing her expected revenue with finite inventories.We characterize an exact condition for the identifiability of near-optimal algorithms, and propose a data-driven algorithm that guarantees near-optimality in the identifiable case and approaches best-achievable optimality gap in the unidentifiable case. Next, we study the classic multi-period joint pricing and inventory control problem in an offline data-driven setting. We assume the demand functions and noise distributions are unknown, and propose a data-driven approximation algorithm, which uses offline demand data to solve the joint pricing and inventory control problem. We establish a polynomial sample complexity bound, the number of data samples needed to guarantee a near-optimal profit. A simulation study suggests that the data-driven algorithm solves the dynamic program effectively. Finally, we study an online learning problem for product selection in urban warehouses managed by fast-delivery retailers. We distill the problem into a semi-bandit model with linear generalization.There are n products, each with a feature vector of dimension T. In each of the T periods, a retailer selects K products to offer, where T is much greater than T or b. We propose an online learning algorithm that iteratively shrinks the upper confidence bounds within each period. Compared to the standard UCB algorithm, we prove the new algorithm reduces the most dominant regret term by a factor of d, and experiments on datasets from Alibaba Group suggest it lowers the total regret by at least 10%..
Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020Cataloged from PDF version of thesis.Includes bibliographical references (pages 213-219).
DepartmentMassachusetts Institute of Technology. Operations Research Center
Massachusetts Institute of Technology
Operations Research Center.