dc.contributor.advisor | Farias, Vivek F. | |
dc.contributor.author | Wu, Farrell Eldrian S. | |
dc.date.accessioned | 2023-11-02T20:05:43Z | |
dc.date.available | 2023-11-02T20:05:43Z | |
dc.date.issued | 2023-09 | |
dc.date.submitted | 2023-10-03T18:21:06.833Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/152649 | |
dc.description.abstract | In this work, we propose a model-free reinforcement learning algorithm for infinte-horizon, average-reward decision processes where the transition function has a finite yet unknown dependence on history, and where the induced Markov Decision Process is assumed to be weakly communicating. This algorithm combines the Lempel-Ziv (LZ) parsing tree structure for states introduced in [4] together with the optimistic Q-learning approach in [9]. We mathematically analyze the algorithm towards showing sublinear regret, providing major steps towards the proof of such. In doing so, we reduce the proof to showing sub-linearity of a key quantity related to the sum of an uncertainty metric at each step. Simulations of the algorithm will be done in a later work. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | Information-theoretic Algorithms for Model-free
Reinforcement Learning | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |