Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
Author(s)
Renda, Alex; Ding, Yi; Carbin, Michael
Download3622856.pdf (3.808Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program.
We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
Date issued
2023-10-16Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryJournal
Proceedings of the ACM on Programming Languages
Publisher
ACM
Citation
Renda, Alex, Ding, Yi and Carbin, Michael. 2023. "Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs." Proceedings of the ACM on Programming Languages, 7 (OOPSLA2).
Version: Final published version
ISSN
2475-1421
Collections
The following license files are associated with this item: