Resource allocation problems in stochastic sequential decision making

Lakshmanan, Hariharan, 1980-

Author(s)

Lakshmanan, Hariharan, 1980-

DownloadFull printable version (7.309Mb)

Other Contributors

Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.

Advisor

Daniela Pucci de Farias and David Simchi-Levi.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

In this thesis, we study resource allocation problems that arise in the context of stochastic sequential decision making problems. The practical utility of optimal algorithms for these problems is limited due to their high computational and storage requirements. Also, an increasing number of applications require a decentralized solution. We develop techniques for approximately solving certain class of resource allocation problems that arise in the context of stochastic sequential decision making problems that are computationally efficient with a focus on decentralized algorithms where appropriate. The first resource allocation problem that we study is a stochastic sequential decision making problem with multiple decision makers (agents) with two main features 1) Partial observability Each agent may not have complete information regarding the system 2) Limited Communication - Each agent may not be able to communicate with all other agents at all times. We formulate a Markov Decision Process (MDP) for this problem. The features of partial observability and limited communication impose additional computational constraints on the exact solution of the MDP. We propose a scheme for approximating the optimal Q function and the optimal value function associated with this MDP as a linear combination of preselected basis functions. We show that the proposed approximation scheme leads to decentralization of the agents' decisions thereby enabling their implementation under limited communication. We propose a linear program, ALP, for selecting the parameters for combining the basis functions. We establish bounds relating the approximation error due to the choice of the parameters selected by the ALP with the best possible error given the choice of basis functions.

(cont.) Motivated by the need for a decentralized solution to the ALP, which is equivalent to a resource allocation problem with separable, concave objective function, we analyze a general class of resource allocation problems with separable concave objective functions. We propose a distributed algorithm for this class of problems when the objective function is differentiable and establish its convergence and convergence rate properties. We develop a smoothing scheme for non-differentiable objective functions and extend the algorithm for this case. Finally, we build on these results to extend the decentralized algorithm to accommodate non-negativity constraints on the resources. Numerical investigations on the performance of the developed algorithm show that our algorithm is competitive with its centralized counterpart. The second resource allocation problem that we study is the problem of optimally accepting or rejecting arriving orders in a Make-To-Order (MTO) manufacturing firm. We model the production facility of the MTO manufacturing firm as a queue and view the time of the production facility as a resource that needs to be optimally allotted between current and future orders. We formulate the Order Acceptance Problem under two arrival processes - Poisson process (OAP-P), and Bernoulli Process (OAP-B) and formulate both problems as MDPs. We provide insights into the structure of the optimal order acceptance policy for OAP-B under the assumption of First Come First Served (FCFS) scheduling of accepted orders.

(cont.) We investigate a class of randomized order acceptance policies for OAP-B called static policies that are practically relevant due to their ease of implementation and develop a procedure for computing the policy gradient for any static policy. Using these results for OAP-B, we propose 4 heuristics for OAP-P. We numerically investigate the performance of the proposed heuristics and compare their performance with other heuristics reported in literature. One of our proposed heuristics, FCFS-ValueFunction outperforms other heuristics under a variety of conditions while also being easy to implement.

Description

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2009.

Includes bibliographical references (p. 159-162).

Date issued

2009

URI

http://hdl.handle.net/1721.1/47736

Department

Massachusetts Institute of Technology. Department of Civil and Environmental Engineering

Publisher

Massachusetts Institute of Technology

Keywords

Civil and Environmental Engineering.

Collections

Doctoral Theses