Show simple item record

dc.contributor.authorFarias, Vivek F.
dc.contributor.authorMoallemi, Ciamac C.
dc.contributor.authorVan Roy, Benjamin
dc.contributor.authorWeissman, Tsachy
dc.date.accessioned2010-10-13T19:43:17Z
dc.date.available2010-10-13T19:43:17Z
dc.date.issued2010-04
dc.date.submitted2007-07
dc.identifier.issn0018-9448
dc.identifier.otherINSPEC Accession Number: 11256626
dc.identifier.urihttp://hdl.handle.net/1721.1/59294
dc.description.abstractWe consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer K such that the future is conditionally independent of the past given a window of K consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (MKIDS Program grant ECS-9985229)en_US
dc.description.sponsorshipBenchmark Stanford Graduate Fellowshipen_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TIT.2010.2043762en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceIEEEen_US
dc.subjectvalue iterationen_US
dc.subjectreinforcement learningen_US
dc.subjectoptimal controlen_US
dc.subjectdynamic programmingen_US
dc.subjectLempel-Ziven_US
dc.subjectContext treeen_US
dc.titleUniversal Reinforcement Learningen_US
dc.typeArticleen_US
dc.identifier.citationFarias, V.F. et al. “Universal Reinforcement Learning.” Information Theory, IEEE Transactions on 56.5 (2010): 2441-2454. © Copyright 2010 IEEEen_US
dc.contributor.departmentSloan School of Managementen_US
dc.contributor.approverFarias, Vivek F.
dc.contributor.mitauthorFarias, Vivek F.
dc.relation.journalIEEE Transactions on Information Theoryen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsFarias, Vivek F.; Moallemi, Ciamac C.; Van Roy, Benjamin; Weissman, Tsachyen
dc.identifier.orcidhttps://orcid.org/0000-0002-5856-9246
mit.licensePUBLISHER_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record