Discovering the Language of Actions

Sharma, Pratyusha

Author(s)

Sharma, Pratyusha

DownloadThesis PDF (4.511Mb)

Advisor

Torralba, Antonio

Andreas, Jacob

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

This thesis takes a look at discovering language-like discrete infinities for actions. How can a stream of continuous data be parsed into skills/concepts and can we tie the decision of what may be the right set of skills with the problem of generating plans over a continuous action space as in the original stream of data? Can we utilize supervision from aligning parallel language instructions to scaffold the discovery of these named primitives of actions from interactions? Here, we present a framework for learning hierarchical policies from demonstrations, using sparse natural language annotations to guide the discovery of reusable skills for autonomous decision-making. It is formulated as a generative model of action sequences in which goals generate sequences of high-level subtask descriptions, and these descriptions generate sequences of low-level actions. The thesis describes how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks, using only a small number of seed annotations to ground language in action. In trained models, the space of natural language commands indexes a combinatorial library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals. The approach is evaluated in the ALFRED household simulation environment, providing natural language annotations for only 10% of demonstrations. It completes more than twice as many tasks as a standard approach to learning from demonstrations, matching the performance of instruction following models with access to ground-truth plans during both training and evaluation. 1 1Code, data, and additional visualizations are available at https://sites.google.com/view/ skill-induction-latent-lang/.

Date issued

2022-02

URI

https://hdl.handle.net/1721.1/143293

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses