Meta-Learning Exploration Strategies with Decision Transformers
Author(s)
Welch, Ryan
DownloadThesis PDF (1.216Mb)
Advisor
Uhler, Caroline
Terms of use
Metadata
Show full item recordAbstract
The problem of pure exploration in sequential decision-making is to identify strategies for efficiently gathering information to uncover hidden properties of an environment. This challenge arises in many practical domains, including clinical diagnostics, recommender systems, and educational testing, where data collection is costly and the effectiveness of exploration is critical. Efficient exploration in these contexts strongly depends on exploiting underlying structural relationships within the environment. For instance, recognizing that multiple medical tests may provide overlapping information can reduce the number of tests required to make a diagnosis. Existing exploration approaches drawn from reinforcement learning and active hypothesis testing typically rely on heuristic strategies that require explicit prior assumptions about such structural information. However, when this information is unknown, heuristic methods often lead to redundant exploration, significantly limiting their practical utility in high-stakes domains. Furthermore, these existing approaches do not leverage past experience to improve their exploration efficiency over time. To overcome these limitations, we introduce In-Context Pure Exploration (ICPE), a novel meta-learning framework capable of autonomously discovering and exploiting latent environmental structures across related tasks to guide efficient exploration. ICPE leverages the in-context learning and sequence-modeling capabilities of transformers, combined with supervised learning and deep reinforcement learning techniques to learn exploration strategies directly from experience. Through extensive experiments on synthetic and semi-synthetic exploration tasks, we demonstrate that ICPE is able to efficiently explore in deterministic, stochastic and highly structured environments without relying on any explicit inductive biases. Our results highlight the potential of ICPE to enable more practical exploration strategies suitable for real-world decision-making contexts.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology