Faster Feedback with AI? A Test Prioritization Study

Mattis, Toni; B?hme, Lukas; Krebs, Eva; Rinard, Martin C.; Hirschfeld, Robert

dc.contributor.author	Mattis, Toni
dc.contributor.author	B?hme, Lukas
dc.contributor.author	Krebs, Eva
dc.contributor.author	Rinard, Martin C.
dc.contributor.author	Hirschfeld, Robert
dc.date.accessioned	2024-08-05T17:05:39Z
dc.date.available	2024-08-05T17:05:39Z
dc.date.issued	2024-03-11
dc.identifier.isbn	979-8-4007-0634-9
dc.identifier.uri	https://hdl.handle.net/1721.1/155934
dc.description	‹Programming›Companion ’24, March 11–15, 2024, Lund, Sweden	en_US
dc.description.abstract	Feedback during programming is desirable, but its usefulness depends on immediacy and relevance to the task. Unit and regression testing are practices to ensure programmers can obtain feedback on their changes; however, running a large test suite is rarely fast, and only a few results are relevant. Identifying tests relevant to a change can help programmers in two ways: upcoming issues can be detected earlier during programming, and relevant tests can serve as examples to help programmers understand the code they are editing. In this work, we describe an approach to evaluate how well large language models (LLMs) and embedding models can judge the relevance of a test to a change. We construct a dataset by applying faulty variations of real-world code changes and measuring whether the model could nominate the failing tests beforehand. We found that, while embedding models perform best on such a task, even simple information retrieval models are surprisingly competitive. In contrast, pre-trained LLMs are of limited use as they focus on confounding aspects like coding styles. We argue that the high computational cost of AI models is not always justified, and tool developers should also consider non-AI models for code-related retrieval and recommendation tasks. Lastly, we generalize from unit tests to live examples and outline how our approach can benefit live programming environments.	en_US
dc.publisher	ACM\|Companion Proceedings of the 8th International Conference on the Art, Science, and Engineering of Programming	en_US
dc.relation.isversionof	10.1145/3660829.3660837	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Faster Feedback with AI? A Test Prioritization Study	en_US
dc.type	Article	en_US
dc.identifier.citation	Mattis, Toni, B?hme, Lukas, Krebs, Eva, Rinard, Martin C. and Hirschfeld, Robert. 2024. "Faster Feedback with AI? A Test Prioritization Study."
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2024-08-01T07:49:57Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2024-08-01T07:49:58Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: license_rdf
Size:: 40bytes
Format:: application/rdf+xml

View/Open

Name:: 3660829.3660837.pdf
Size:: 797.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record