Online learning and optimization in operations management

Sun, Rui,Ph. D.Massachusetts Institute of Technology.

Author(s)

Sun, Rui,Ph. D.Massachusetts Institute of Technology.

Download1227276680-MIT.pdf (3.566Mb)

Other Contributors

Massachusetts Institute of Technology. Institute for Data, Systems, and Society.

Advisor

David Simchi-Levi.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

We study in this thesis online learning and optimization problems in operations management where we need to make decisions in the face of incomplete information and operational constraints in a dynamic environment. We first consider an online matching problem where a central platform needs to match a number of limited resources to different groups of users that arrive sequentially over time. The platform does not know the reward of each matching option and must learn the true rewards from the matching results. We formulate the problem as a Markovian multi-armed bandit with budget constraints, and propose an innovative algorithm that is based on assembling the policies for each single arm. We prove the algorithm's worst-case performance guarantee, and numerically show the algorithm's robust performance compared to alternative heuristics. We next consider a revenue management problem with add-on discounts where a retailer offers discounts on selected supportive products (e.g.

video games) to customers who have also purchased the core products (e.g. video game consoles). When the products' demand functions are unknown, we propose a UCB-based learning algorithm that uses the an FPTAS optimization algorithm as a subroutine to determine the prices of different types of products. We show that the algorithm can converge to the optimal full-information pricing policy. We also conduct numerical experiments with real-world data to illustrate the performance of our algorithm and the advantage of using the add-on discount strategy in practice. We last consider a network revenue management problem where a retailer aims to maximize revenue from multiple products with limited inventory. The retailer does not know the demand of different products, and must learn demand from the sales data. To optimize the pricing decisions, we propose an efficient algorithm that combines the Thompson sampling technique and the online gradient descent method with a primal-dual framework.

In comparison to traditional algorithms that are based on frequently solving linear programs, our algorithm does not need to solve any linear program, and therefore, has the advantage in computational efficiency. We analyze the performance guarantee of our algorithm, and show the algorithm's fast running time through numerical experiments.

Description

Thesis: Ph. D. in Social Engineering Systems and Statistics, Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society, September, 2020

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 161-167).

Date issued

2020

URI

https://hdl.handle.net/1721.1/129140

Department

Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Massachusetts Institute of Technology. Engineering Systems Division

Publisher

Massachusetts Institute of Technology

Keywords

Institute for Data, Systems, and Society.

Collections

Doctoral Theses