dc.contributor.author | Farias, Vivek F. | |
dc.contributor.author | Moallemi, Ciamac C. | |
dc.contributor.author | Van Roy, Benjamin | |
dc.contributor.author | Weissman, Tsachy | |
dc.date.accessioned | 2010-10-13T19:43:17Z | |
dc.date.available | 2010-10-13T19:43:17Z | |
dc.date.issued | 2010-04 | |
dc.date.submitted | 2007-07 | |
dc.identifier.issn | 0018-9448 | |
dc.identifier.other | INSPEC Accession Number: 11256626 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/59294 | |
dc.description.abstract | We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer K such that the future is conditionally independent of the past given a window of K consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm. | en_US |
dc.description.sponsorship | National Science Foundation (U.S.) (MKIDS Program grant ECS-9985229) | en_US |
dc.description.sponsorship | Benchmark Stanford Graduate Fellowship | en_US |
dc.language.iso | en_US | |
dc.publisher | Institute of Electrical and Electronics Engineers | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1109/TIT.2010.2043762 | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | IEEE | en_US |
dc.subject | value iteration | en_US |
dc.subject | reinforcement learning | en_US |
dc.subject | optimal control | en_US |
dc.subject | dynamic programming | en_US |
dc.subject | Lempel-Ziv | en_US |
dc.subject | Context tree | en_US |
dc.title | Universal Reinforcement Learning | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Farias, V.F. et al. “Universal Reinforcement Learning.” Information Theory, IEEE Transactions on 56.5 (2010): 2441-2454. © Copyright 2010 IEEE | en_US |
dc.contributor.department | Sloan School of Management | en_US |
dc.contributor.approver | Farias, Vivek F. | |
dc.contributor.mitauthor | Farias, Vivek F. | |
dc.relation.journal | IEEE Transactions on Information Theory | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dspace.orderedauthors | Farias, Vivek F.; Moallemi, Ciamac C.; Van Roy, Benjamin; Weissman, Tsachy | en |
dc.identifier.orcid | https://orcid.org/0000-0002-5856-9246 | |
mit.license | PUBLISHER_POLICY | en_US |
mit.metadata.status | Complete | |