Show simple item record

dc.contributor.advisorAndreas, Jacob D.
dc.contributor.advisorTenenbaum, Joshua B.
dc.contributor.authorGrand, Gabriel J.
dc.date.accessioned2023-07-31T19:31:14Z
dc.date.available2023-07-31T19:31:14Z
dc.date.issued2023-06
dc.date.submitted2023-07-13T14:21:11.694Z
dc.identifier.urihttps://hdl.handle.net/1721.1/151322
dc.description.abstractLarge language models (LLMs) are growing highly adept at language-guided program synthesis: translating natural language specifications into code to solve programming tasks. Nevertheless, current approaches require searching through a vast space of strings, often needing thousands of guesses to discover solutions to difficult tasks at inference time. In contrast, human programmers learn to solve problems on-the-fly by building up hierarchical libraries of abstractions: symbolic expressions that encapsulate reusable functionality. In this work, we draw on models of library learning from the programming languages (PL) literature, enriching them with the ability to perform search and abstraction learning with LLMs. We introduce Lilo, a neurosymbolic framework for Library Induction from Language Observations, which consists of three components: an LLM synthesizer, a symbolic compression module, and an auto-documentation (AutoDoc) procedure. Drawing on human language as a source of commonsense knowledge, Lilo learns abstractions that would be intractable to discover with traditional enumerative search. In our evaluations against DreamCoder, a state-of-the-art library learning algorithm, we find that Lilo solves more tasks while achieving faster search times and comparable computational costs. A central aspect of Lilo is a neurosymbolic integration between the LLM synthesizer and Stitch, a high-performance program compression algorithm that identifies useful abstractions in lambda calculus expressions. Lilo augments Stitch with AutoDoc, which generates human-readable names and docstrings for abstractions using an LLM. In addition to improving interpretability, we find that AutoDoc crucially assists Lilo’s synthesizer to infer the semantics of abstractions. In sum, Lilo offers an optimistic “better together” vision where human programmers work in tandem with LLMs and PL tools, building up shared libraries of abstractions to enable creative solutions to complex software problems. Code for this work is available at: github.com/gabegrand/lilo.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleDiscovering Abstractions from Language via Neurosymbolic Program Synthesis
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record