Show simple item record

dc.contributor.advisorTorralba, Antonio
dc.contributor.advisorAndreas, Jacob
dc.contributor.authorSharma, Pratyusha
dc.date.accessioned2022-06-15T13:10:21Z
dc.date.available2022-06-15T13:10:21Z
dc.date.issued2022-02
dc.date.submitted2022-03-04T20:59:48.549Z
dc.identifier.urihttps://hdl.handle.net/1721.1/143293
dc.description.abstractThis thesis takes a look at discovering language-like discrete infinities for actions. How can a stream of continuous data be parsed into skills/concepts and can we tie the decision of what may be the right set of skills with the problem of generating plans over a continuous action space as in the original stream of data? Can we utilize supervision from aligning parallel language instructions to scaffold the discovery of these named primitives of actions from interactions? Here, we present a framework for learning hierarchical policies from demonstrations, using sparse natural language annotations to guide the discovery of reusable skills for autonomous decision-making. It is formulated as a generative model of action sequences in which goals generate sequences of high-level subtask descriptions, and these descriptions generate sequences of low-level actions. The thesis describes how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks, using only a small number of seed annotations to ground language in action. In trained models, the space of natural language commands indexes a combinatorial library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals. The approach is evaluated in the ALFRED household simulation environment, providing natural language annotations for only 10% of demonstrations. It completes more than twice as many tasks as a standard approach to learning from demonstrations, matching the performance of instruction following models with access to ground-truth plans during both training and evaluation. 1 1Code, data, and additional visualizations are available at https://sites.google.com/view/ skill-induction-latent-lang/.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleDiscovering the Language of Actions
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record