| dc.contributor.author | Renda, Alex | |
| dc.contributor.author | Ding, Yi | |
| dc.contributor.author | Carbin, Michael | |
| dc.date.accessioned | 2023-11-17T18:44:13Z | |
| dc.date.available | 2023-11-17T18:44:13Z | |
| dc.date.issued | 2023-10-16 | |
| dc.identifier.issn | 2475-1421 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/153003 | |
| dc.description.abstract | Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program.
We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy. | en_US |
| dc.publisher | ACM | en_US |
| dc.relation.isversionof | https://doi.org/10.1145/3622856 | en_US |
| dc.rights | Creative Commons Attribution | en_US |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
| dc.source | Association for Computing Machinery | en_US |
| dc.title | Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Renda, Alex, Ding, Yi and Carbin, Michael. 2023. "Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs." Proceedings of the ACM on Programming Languages, 7 (OOPSLA2). | |
| dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
| dc.relation.journal | Proceedings of the ACM on Programming Languages | en_US |
| dc.identifier.mitlicense | PUBLISHER_CC | |
| dc.eprint.version | Final published version | en_US |
| dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
| eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
| dc.date.updated | 2023-11-01T07:57:43Z | |
| dc.language.rfc3066 | en | |
| dc.rights.holder | The author(s) | |
| dspace.date.submission | 2023-11-01T07:57:44Z | |
| mit.journal.volume | 7 | en_US |
| mit.journal.issue | OOPSLA2 | en_US |
| mit.license | PUBLISHER_CC | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |