Online and offline learning in operations

Wang, Li

Author(s)

Wang, Li

Download1227096890-MIT.pdf (1.485Mb)

Other Contributors

Massachusetts Institute of Technology. Operations Research Center.

Advisor

David Simchi-Levi.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

With the rapid advancement of information technology and accelerated development of data science, the importance of integrating data into decision-making has never been stronger. In this thesis, we propose data-driven algorithms to incorporate learning from data in three operations problems, concerning both online learning and offline learning settings. First, we study a single product pricing problem with demand censoring in an offline data-driven setting. In this problem, a retailer is given a finite level of inventory, and faces a random demand that is price sensitive in a linear fashion with unknown parameters and distribution. Any unsatisfied demand is lost and unobservable. The retailer's objective is to use offline censored demand data to find an optimal price, maximizing her expected revenue with finite inventories.

We characterize an exact condition for the identifiability of near-optimal algorithms, and propose a data-driven algorithm that guarantees near-optimality in the identifiable case and approaches best-achievable optimality gap in the unidentifiable case. Next, we study the classic multi-period joint pricing and inventory control problem in an offline data-driven setting. We assume the demand functions and noise distributions are unknown, and propose a data-driven approximation algorithm, which uses offline demand data to solve the joint pricing and inventory control problem. We establish a polynomial sample complexity bound, the number of data samples needed to guarantee a near-optimal profit. A simulation study suggests that the data-driven algorithm solves the dynamic program effectively. Finally, we study an online learning problem for product selection in urban warehouses managed by fast-delivery retailers. We distill the problem into a semi-bandit model with linear generalization.

There are n products, each with a feature vector of dimension T. In each of the T periods, a retailer selects K products to offer, where T is much greater than T or b. We propose an online learning algorithm that iteratively shrinks the upper confidence bounds within each period. Compared to the standard UCB algorithm, we prove the new algorithm reduces the most dominant regret term by a factor of d, and experiments on datasets from Alibaba Group suggest it lowers the total regret by at least 10%..

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 213-219).

Date issued

2020

URI

https://hdl.handle.net/1721.1/129080

Department

Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management

Publisher

Massachusetts Institute of Technology

Keywords

Operations Research Center.

Collections

Doctoral Theses