Show simple item record

dc.contributor.advisorDaskalakis, Constantinos
dc.contributor.advisorRakhlin, Alexander
dc.contributor.authorChen, Fan
dc.date.accessioned2025-03-27T16:58:58Z
dc.date.available2025-03-27T16:58:58Z
dc.date.issued2025-02
dc.date.submitted2025-03-04T17:27:24.828Z
dc.identifier.urihttps://hdl.handle.net/1721.1/158934
dc.description.abstractWe study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of δ-separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp statistical threshold for the horizon length necessary for efficient learning. On the computational side, we show that under a weaker assumption of separability under the optimal policy, there is a quasi-polynomial algorithm with time complexity scaling in terms of the statistical threshold. We further show a near-matching time complexity lower bound under the exponential time hypothesis.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleNear-Optimal Learning and Planning in Separated Latent MDPs
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record