MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Discovering Abstractions from Language via Neurosymbolic Program Synthesis

Author(s)
Grand, Gabriel J.
Thumbnail
DownloadThesis PDF (6.104Mb)
Advisor
Andreas, Jacob D.
Tenenbaum, Joshua B.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Large language models (LLMs) are growing highly adept at language-guided program synthesis: translating natural language specifications into code to solve programming tasks. Nevertheless, current approaches require searching through a vast space of strings, often needing thousands of guesses to discover solutions to difficult tasks at inference time. In contrast, human programmers learn to solve problems on-the-fly by building up hierarchical libraries of abstractions: symbolic expressions that encapsulate reusable functionality. In this work, we draw on models of library learning from the programming languages (PL) literature, enriching them with the ability to perform search and abstraction learning with LLMs. We introduce Lilo, a neurosymbolic framework for Library Induction from Language Observations, which consists of three components: an LLM synthesizer, a symbolic compression module, and an auto-documentation (AutoDoc) procedure. Drawing on human language as a source of commonsense knowledge, Lilo learns abstractions that would be intractable to discover with traditional enumerative search. In our evaluations against DreamCoder, a state-of-the-art library learning algorithm, we find that Lilo solves more tasks while achieving faster search times and comparable computational costs. A central aspect of Lilo is a neurosymbolic integration between the LLM synthesizer and Stitch, a high-performance program compression algorithm that identifies useful abstractions in lambda calculus expressions. Lilo augments Stitch with AutoDoc, which generates human-readable names and docstrings for abstractions using an LLM. In addition to improving interpretability, we find that AutoDoc crucially assists Lilo’s synthesizer to infer the semantics of abstractions. In sum, Lilo offers an optimistic “better together” vision where human programmers work in tandem with LLMs and PL tools, building up shared libraries of abstractions to enable creative solutions to complex software problems. Code for this work is available at: github.com/gabegrand/lilo.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151322
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.