LLMs in Citation Intent Classification: Progress, Precision, and Reproducibility Challenges
Author(s)
Fogelson, Alex; Thompson, Neil; Trišović, Ana
Download3736731.3746137.pdf (872.4Kb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
Understanding the intent behind scientific citations is critical for
advancing scholarly search and knowledge mapping. This paper
reflects on the methodological use of large language models (LLMs)
for multi-class citation intent classification. Our experiments evaluating a diverse range of models and approaches reveal striking
disagreement among state-of-the-art (SotA) systems. This inconsistency suggests that citation intent classification remains a challenging task for LLMs raising questions about the robustness, reliability
and replicability of current methods. Moreover, our findings highlight a concerning dependency on proprietary LLMs that, even
with access to compute resources, were necessary to achieve sufficient accuracy. This introduces new challenges, as silent updates,
lack of versioning, and opaque training pipelines pose threats to
methodological transparency and long-term reproducibility in LLMenabled research.
Description
ACM REP ’25, Vancouver, BC, Canada
Date issued
2025-10-21Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryPublisher
ACM|ACM Conference on Reproducibility and Replicability
Citation
Alex Fogelson, Ana Trišović, and Neil Thompson. 2025. LLMs in Citation Intent Classification: Progress, Precision, and Reproducibility Challenges. In Proceedings of the 3rd ACM Conference on Reproducibility and Replicability (ACM REP '25). Association for Computing Machinery, New York, NY, USA, 250–253.
Version: Final published version
ISBN
979-8-4007-1958-5