On the Learnability of General Reinforcement-Learning Objectives

Yang, Cambridge

Author(s)

Yang, Cambridge

DownloadThesis PDF (7.362Mb)

Advisor

Carbin, Michael

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Reinforcement learning enables agents to learn decision-making policies in unknown environments to achieve specified objectives. Traditionally, these objectives are expressed through reward functions, enabling well-established guarantees on learning near-optimal policies with a high probability — a property known as probably approximately correct (PAC) -learnability. However, reward functions often serve as imperfect surrogates for true objectives, leading to reward hacking and undermining these guarantees. This thesis formalizes the specification and learnability of general reinforcement-learning objectives beyond rewards, addressing fundamental questions of expressivity and policy learnability. I examine three increasingly expressive classes of objectives: (1) Linear Temporal Logic (LTL) objectives, which extend conventional scalar rewards to temporal specifications of behavior and have garnered recent attention, (2) Computable objectives, encompassing a broad class of structured, algorithmically definable objectives and (3) Non-computable objectives, representing general objectives beyond the computable class. For LTL objectives, I prove that only finitary LTL objectives are PAC-learnable, while infinite-horizon LTL objectives are inherently intractable under the PAC-MDP framework. Extending this result, I establish a general criterion: an objective is PAC-learnable if it is continuous and computable. This criterion facilitates the establishment of PAC-learnability for various existing classes of objectives with unknown PAC-learnability and informs the design of new, learnable objective specifications. Finally, for non-computable objectives, I introduce limit PAC-learnability, a practical relaxation where a sequence of computable, PAC-learnable objectives approximates a non-computable objective. I formalize a universal representation of non-computable objectives using nested limits of computable functions and provide sufficient conditions under which limit PAC-learnability holds. By establishing a theoretical foundation for general RL objectives, this thesis advances our understanding of which objectives are learnable, how they can be specified, and how agents can effectively learn policies to optimize them. These results contribute to the broader goal of designing intelligent agents that align with expressive, formally defined objectives—moving beyond the limitations of reward-based surrogates.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/164131

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses