Discovering Abstractions from Language
via Neurosymbolic Program Synthesis

Grand, Gabriel J.

Author(s)

Grand, Gabriel J.

DownloadThesis PDF (6.104Mb)

Advisor

Andreas, Jacob D.

Tenenbaum, Joshua B.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Large language models (LLMs) are growing highly adept at language-guided program synthesis: translating natural language specifications into code to solve programming tasks. Nevertheless, current approaches require searching through a vast space of strings, often needing thousands of guesses to discover solutions to difficult tasks at inference time. In contrast, human programmers learn to solve problems on-the-fly by building up hierarchical libraries of abstractions: symbolic expressions that encapsulate reusable functionality. In this work, we draw on models of library learning from the programming languages (PL) literature, enriching them with the ability to perform search and abstraction learning with LLMs. We introduce Lilo, a neurosymbolic framework for Library Induction from Language Observations, which consists of three components: an LLM synthesizer, a symbolic compression module, and an auto-documentation (AutoDoc) procedure. Drawing on human language as a source of commonsense knowledge, Lilo learns abstractions that would be intractable to discover with traditional enumerative search. In our evaluations against DreamCoder, a state-of-the-art library learning algorithm, we find that Lilo solves more tasks while achieving faster search times and comparable computational costs. A central aspect of Lilo is a neurosymbolic integration between the LLM synthesizer and Stitch, a high-performance program compression algorithm that identifies useful abstractions in lambda calculus expressions. Lilo augments Stitch with AutoDoc, which generates human-readable names and docstrings for abstractions using an LLM. In addition to improving interpretability, we find that AutoDoc crucially assists Lilo’s synthesizer to infer the semantics of abstractions. In sum, Lilo offers an optimistic “better together” vision where human programmers work in tandem with LLMs and PL tools, building up shared libraries of abstractions to enable creative solutions to complex software problems. Code for this work is available at: github.com/gabegrand/lilo.

Date issued

2023-06

URI

https://hdl.handle.net/1721.1/151322

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses