Compositional Policy Priors

Wingate, David; Diuk, Carlos; O'Donnell, Timothy; Tenenbaum, Joshua; Gershman, Samuel

Author(s)

Wingate, David; Diuk, Carlos; O'Donnell, Timothy; Tenenbaum, Joshua; Gershman, Samuel

DownloadMIT-CSAIL-TR-2013-007.pdf (577.9Kb)

Other Contributors

Computational Cognitive Science

Advisor

Joshua Tenenbaum

Metadata

Show full item record

Abstract

This paper describes a probabilistic framework for incorporating structured inductive biases into reinforcement learning. These inductive biases arise from policy priors, probability distributions over optimal policies. Borrowing recent ideas from computational linguistics and Bayesian nonparametrics, we define several families of policy priors that express compositional, abstract structure in a domain. Compositionality is expressed using probabilistic context-free grammars, enabling a compact representation of hierarchically organized sub-tasks. Useful sequences of sub-tasks can be cached and reused by extending the grammars nonparametrically using Fragment Grammars. We present Monte Carlo methods for performing inference, and show how structured policy priors lead to substantially faster learning in complex domains compared to methods without inductive biases.

Date issued

2013-04-12

URI

http://hdl.handle.net/1721.1/78573

Series/Report no.

MIT-CSAIL-TR-2013-007

Collections

CSAIL Technical Reports (July 1, 2003 - present)