Discovering Abstractions from Language
via Neurosymbolic Program Synthesis

Grand, Gabriel J.

dc.contributor.advisor	Andreas, Jacob D.
dc.contributor.advisor	Tenenbaum, Joshua B.
dc.contributor.author	Grand, Gabriel J.
dc.date.accessioned	2023-07-31T19:31:14Z
dc.date.available	2023-07-31T19:31:14Z
dc.date.issued	2023-06
dc.date.submitted	2023-07-13T14:21:11.694Z
dc.identifier.uri	https://hdl.handle.net/1721.1/151322
dc.description.abstract	Large language models (LLMs) are growing highly adept at language-guided program synthesis: translating natural language specifications into code to solve programming tasks. Nevertheless, current approaches require searching through a vast space of strings, often needing thousands of guesses to discover solutions to difficult tasks at inference time. In contrast, human programmers learn to solve problems on-the-fly by building up hierarchical libraries of abstractions: symbolic expressions that encapsulate reusable functionality. In this work, we draw on models of library learning from the programming languages (PL) literature, enriching them with the ability to perform search and abstraction learning with LLMs. We introduce Lilo, a neurosymbolic framework for Library Induction from Language Observations, which consists of three components: an LLM synthesizer, a symbolic compression module, and an auto-documentation (AutoDoc) procedure. Drawing on human language as a source of commonsense knowledge, Lilo learns abstractions that would be intractable to discover with traditional enumerative search. In our evaluations against DreamCoder, a state-of-the-art library learning algorithm, we find that Lilo solves more tasks while achieving faster search times and comparable computational costs. A central aspect of Lilo is a neurosymbolic integration between the LLM synthesizer and Stitch, a high-performance program compression algorithm that identifies useful abstractions in lambda calculus expressions. Lilo augments Stitch with AutoDoc, which generates human-readable names and docstrings for abstractions using an LLM. In addition to improving interpretability, we find that AutoDoc crucially assists Lilo’s synthesizer to infer the semantics of abstractions. In sum, Lilo offers an optimistic “better together” vision where human programmers work in tandem with LLMs and PL tools, building up shared libraries of abstractions to enable creative solutions to complex software problems. Code for this work is available at: github.com/gabegrand/lilo.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Discovering Abstractions from Language via Neurosymbolic Program Synthesis
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: grand-grandg-sm-eecs-2023-thes ...
Size:: 6.104Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record

Discovering Abstractions from Language via Neurosymbolic Program Synthesis

Files in this item

This item appears in the following Collection(s)