Online learning with sample path constraints

Mannor, Shie; Tsitsiklis, John N.; Yu, Jia Yuan

Author(s)

Mannor, Shie; Tsitsiklis, John N.; Yu, Jia Yuan

DownloadMannor+JNT-Online.pdf (240.8Kb)

PUBLISHER_POLICY

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We de ne the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem.

Date issued

2009

URI

http://hdl.handle.net/1721.1/51700

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems

Journal

Journal of Machine Learning Research

Publisher

MIT Press

Citation

Mannor, Shie, John N. Tsitsiklis, and Jia Yuan Yu. “Online Learning with Sample Path Constraints.” J. Mach. Learn. Res. 10 (2009): 569-590.

Version: Original manuscript

ISSN

1532-4435

Collections

MIT Open Access Articles

DSpace@MIT