Show simple item record

dc.contributor.authorMattis, Toni
dc.contributor.authorB?hme, Lukas
dc.contributor.authorKrebs, Eva
dc.contributor.authorRinard, Martin C.
dc.contributor.authorHirschfeld, Robert
dc.date.accessioned2024-08-05T17:05:39Z
dc.date.available2024-08-05T17:05:39Z
dc.date.issued2024-03-11
dc.identifier.isbn979-8-4007-0634-9
dc.identifier.urihttps://hdl.handle.net/1721.1/155934
dc.description‹Programming›Companion ’24, March 11–15, 2024, Lund, Swedenen_US
dc.description.abstractFeedback during programming is desirable, but its usefulness depends on immediacy and relevance to the task. Unit and regression testing are practices to ensure programmers can obtain feedback on their changes; however, running a large test suite is rarely fast, and only a few results are relevant. Identifying tests relevant to a change can help programmers in two ways: upcoming issues can be detected earlier during programming, and relevant tests can serve as examples to help programmers understand the code they are editing. In this work, we describe an approach to evaluate how well large language models (LLMs) and embedding models can judge the relevance of a test to a change. We construct a dataset by applying faulty variations of real-world code changes and measuring whether the model could nominate the failing tests beforehand. We found that, while embedding models perform best on such a task, even simple information retrieval models are surprisingly competitive. In contrast, pre-trained LLMs are of limited use as they focus on confounding aspects like coding styles. We argue that the high computational cost of AI models is not always justified, and tool developers should also consider non-AI models for code-related retrieval and recommendation tasks. Lastly, we generalize from unit tests to live examples and outline how our approach can benefit live programming environments.en_US
dc.publisherACM|Companion Proceedings of the 8th International Conference on the Art, Science, and Engineering of Programmingen_US
dc.relation.isversionof10.1145/3660829.3660837en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleFaster Feedback with AI? A Test Prioritization Studyen_US
dc.typeArticleen_US
dc.identifier.citationMattis, Toni, B?hme, Lukas, Krebs, Eva, Rinard, Martin C. and Hirschfeld, Robert. 2024. "Faster Feedback with AI? A Test Prioritization Study."
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2024-08-01T07:49:57Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-08-01T07:49:58Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record