Online learning and optimization in operations management
Author(s)
Sun, Rui,Ph. D.Massachusetts Institute of Technology.
Download1227276680-MIT.pdf (3.566Mb)
Other Contributors
Massachusetts Institute of Technology. Institute for Data, Systems, and Society.
Advisor
David Simchi-Levi.
Terms of use
Metadata
Show full item recordAbstract
We study in this thesis online learning and optimization problems in operations management where we need to make decisions in the face of incomplete information and operational constraints in a dynamic environment. We first consider an online matching problem where a central platform needs to match a number of limited resources to different groups of users that arrive sequentially over time. The platform does not know the reward of each matching option and must learn the true rewards from the matching results. We formulate the problem as a Markovian multi-armed bandit with budget constraints, and propose an innovative algorithm that is based on assembling the policies for each single arm. We prove the algorithm's worst-case performance guarantee, and numerically show the algorithm's robust performance compared to alternative heuristics. We next consider a revenue management problem with add-on discounts where a retailer offers discounts on selected supportive products (e.g. video games) to customers who have also purchased the core products (e.g. video game consoles). When the products' demand functions are unknown, we propose a UCB-based learning algorithm that uses the an FPTAS optimization algorithm as a subroutine to determine the prices of different types of products. We show that the algorithm can converge to the optimal full-information pricing policy. We also conduct numerical experiments with real-world data to illustrate the performance of our algorithm and the advantage of using the add-on discount strategy in practice. We last consider a network revenue management problem where a retailer aims to maximize revenue from multiple products with limited inventory. The retailer does not know the demand of different products, and must learn demand from the sales data. To optimize the pricing decisions, we propose an efficient algorithm that combines the Thompson sampling technique and the online gradient descent method with a primal-dual framework. In comparison to traditional algorithms that are based on frequently solving linear programs, our algorithm does not need to solve any linear program, and therefore, has the advantage in computational efficiency. We analyze the performance guarantee of our algorithm, and show the algorithm's fast running time through numerical experiments.
Description
Thesis: Ph. D. in Social Engineering Systems and Statistics, Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society, September, 2020 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 161-167).
Date issued
2020Department
Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Massachusetts Institute of Technology. Engineering Systems DivisionPublisher
Massachusetts Institute of Technology
Keywords
Institute for Data, Systems, and Society.