Code Summarization and Program Synthesis with Large Language Models

Lam, Kelly

Author(s)

Lam, Kelly

DownloadThesis PDF (559.2Kb)

Advisor

Cafarella, Michael

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclear how effective they are with code summarization and generation, especially as we examine longer source code segments or more complicated prompts for generation. In this thesis, we will formalize the automatic code summarization and generation problems, identify some cases where large-language models can perform poorly, propose some techniques to correct the initial bad results, and evaluate our results against appropriate baselines using suitable evaluation metrics.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156757

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses