MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Study on LLMs for Promptagator-Style Dense Retriever Training

Author(s)
Gwon, Daniel; Jedidi, Nour; Lin, Jimmy
Thumbnail
Download3746252.3760960.pdf (553.4Kb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
Promptagator demonstrated that Large Language Models (LLMs) with few-shot prompts can be used as task-specific query generators for fine-tuning domain-specialized dense retrieval models. However, the original Promptagator approach relied on proprietary and large-scale LLMs which users may not have access to or may be prohibited from using with sensitive data. In this work, we study the impact of open-source LLMs at accessible scales (≤14B parameters) as an alternative. Our results demonstrate that open-source LLMs as small as 3B parameters can serve as effective Promptagator-style query generators. We hope our work will inform practitioners with reliable alternatives for synthetic data generation and give insights to maximize fine-tuning results for domain-specific applications. Our code is available at https://www.github.com/mitll/promptodile
Description
CIKM ’25, Seoul, Republic of Korea
Date issued
2025-11-10
URI
https://hdl.handle.net/1721.1/164284
Department
Lincoln Laboratory
Publisher
ACM|Proceedings of the 34th ACM International Conference on Information and Knowledge Management
Citation
Daniel Gwon, Nour Jedidi, and Jimmy Lin. 2025. Study on LLMs for Promptagator-Style Dense Retriever Training. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM '25). Association for Computing Machinery, New York, NY, USA, 4748–4752.
Version: Final published version
ISBN
979-8-4007-2040-6

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.