Show simple item record

dc.contributor.advisorFarias, Vivek F.
dc.contributor.authorWu, Farrell Eldrian S.
dc.date.accessioned2023-11-02T20:05:43Z
dc.date.available2023-11-02T20:05:43Z
dc.date.issued2023-09
dc.date.submitted2023-10-03T18:21:06.833Z
dc.identifier.urihttps://hdl.handle.net/1721.1/152649
dc.description.abstractIn this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algorithm combines the Lempel-Ziv (LZ) parsing tree structure for states introduced in [4] together with the optimistic Q-learning approach in [9]. We mathematically analyze the algorithm towards showing sublinear regret, providing major steps towards the proof of such. In doing so, we reduce the proof to showing sub-linearity of a key quantity related to the sum of an uncertainty metric at each step. Simulations of the algorithm will be done in a later work.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleInformation-theoretic Algorithms for Model-free Reinforcement Learning
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record