Using existing knowledge for transfer and regularization for program synthesis with genetic programming

Wick, Jordan(Jordan M.)

Author(s)

Wick, Jordan(Jordan M.)

Download1193031850-MIT.pdf (1.195Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Erik Hemberg and Una-May O'Reilly.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

In normal Genetic Programming (GP), test case performance is the only signal the population has to improve on. However, human programmers use other signals to guide them - they know what "good" code looks like. They often reuse program patterns across multiple functions, showing Transferability of knowledge between programming tasks. Pieces of code also become more or less likely in different contexts - you don't see many For loops nested 4 layers deep in codebases generated by humans. In this thesis, Transferability is explored in the context of Grammatical Evolution, looking at how a population of problems being optimized to solve one problem can be used to aid in solving a similar problem. To do this, methods were created to parse existing solutions from a codebase into the Grammatical Evolution representation, and operators were implemented that switch the GE objective throughout the process of evolution. A "humanlike" objective was defined which takes into account the distribution of AST nodes within different program contexts. This was used as Regularization during GE, in that programs that strayed further from the humanlike distribution of nodes received a penalty. It was found that optimizing for one problem first can make it easier to find a solution to other similar problems, especially when the solution to one problem is used in the other - however, the amount of pre-optimization and the choice of problem are of great imprtance. Additionally, optimizing directly for code to become more "humanlike" via the defined measure was not effective in allowing the population solve test cases more efficiently, although selecting directly for these metrics did change the distribution of these metrics in the resulting populations. This shows that while surrogate objectives can improve performance, they need to be chosen carefully.

Description

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 81-84).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127545

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses