Meta-Learning Exploration Strategies with Decision Transformers

Welch, Ryan

Author(s)

Welch, Ryan

DownloadThesis PDF (1.216Mb)

Advisor

Uhler, Caroline

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The problem of pure exploration in sequential decision-making is to identify strategies for efficiently gathering information to uncover hidden properties of an environment. This challenge arises in many practical domains, including clinical diagnostics, recommender systems, and educational testing, where data collection is costly and the effectiveness of exploration is critical. Efficient exploration in these contexts strongly depends on exploiting underlying structural relationships within the environment. For instance, recognizing that multiple medical tests may provide overlapping information can reduce the number of tests required to make a diagnosis. Existing exploration approaches drawn from reinforcement learning and active hypothesis testing typically rely on heuristic strategies that require explicit prior assumptions about such structural information. However, when this information is unknown, heuristic methods often lead to redundant exploration, significantly limiting their practical utility in high-stakes domains. Furthermore, these existing approaches do not leverage past experience to improve their exploration efficiency over time. To overcome these limitations, we introduce In-Context Pure Exploration (ICPE), a novel meta-learning framework capable of autonomously discovering and exploiting latent environmental structures across related tasks to guide efficient exploration. ICPE leverages the in-context learning and sequence-modeling capabilities of transformers, combined with supervised learning and deep reinforcement learning techniques to learn exploration strategies directly from experience. Through extensive experiments on synthetic and semi-synthetic exploration tasks, we demonstrate that ICPE is able to efficiently explore in deterministic, stochastic and highly structured environments without relying on any explicit inductive biases. Our results highlight the potential of ICPE to enable more practical exploration strategies suitable for real-world decision-making contexts.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162961

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses