Conservation Laws, Extended Polymatroids and Multi-Armed Bandit Problems; A unified Approach to Indexabel Systems
Author(s)
Bertsimas, Dimitris J.; Nino-Mora, Jose
DownloadOR-277-93.pdf (2.671Mb)
Metadata
Show full item recordAbstract
We show that if performance measures in stochastic and dynamic scheduling problems satisfy generalized conservation laws, then the feasible space of achievable performance is a polyhedron called an extended polymatroid that generalizes the usual polymatroids introduced by Edmonds. Optimization of a linear objective over an extended polymatroid is solved by an adaptive greedy algorithm, which leads to an optimal solution having an indexability property (indexable systems). Under a certain condition, then the indices have a stronger decomposition property (decomposable systems). The following classical problems can be analyzed using our theory: multi-armed bandit problems, branching bandits. multiclass queues, multiclass queues with feedback, deterministic scheduling problemls. Interesting consequences of our results include: (1) a characterization of indexable systems as systems that satisfy generalized conservation laws, (2) a. sufficient condition for idexable systems to be decomposable, (3) a new linear programming proof of the decomposability property of Gittins indices in multi-armed bandit problems, (4) a unified and practical approach to sensitivity analysis of indexable systems, (5) a new characterization of the indices of indexable systems as sums of dual variables and a new interpretation of the indices in terms of retirement options in the context of branching bandits, (6) the first rigorous analysis of the indexability of undiscounted branching bandits, (7) a new algorithm to compute the indices of indexable systems (in particular Gittins indices), which is as fast as the fastest known algorithm, (8) a unification of the algorithm of Klimov for multiclass queues and the algorithm of Gittins for multi-armed bandits as special cases of the same algorithm. (9) closed form formulae for the performance of the optimal policy, and (10) an understanding of the nondependence of the indices on some of the parameters of the stochastic schediiuling problem. Most importantly, our approach provides a unified treatment of several classical problems in stochastic and dynamic scheduling and is able to address in a unified way their variations such as: discounted versus undiscounted cost criterion, rewards versus taxes. preemption versus nonpreemption, discrete versus continuous time, work conserving versus idling policies, linear versus nonlinear objective functions.
Date issued
1993-03Publisher
Massachusetts Institute of Technology, Operations Research Center
Series/Report no.
Operations Research Center Working Paper;OR 277-93