Savaal: A system for automatically generating high-quality questions from unseen documents
Author(s)
Chandler, Joseph A.
DownloadThesis PDF (2.572Mb)
Advisor
Balakrishnan, Hari
Terms of use
Metadata
Show full item recordAbstract
Assessing human understanding through exams and quizzes is fundamental to learning and advancement in both educational and professional settings. However, current solutions to automate the generation of challenging questions from educational materials and documents are insufficient, resulting in superficial or often irrelevant questions. While LLMs have been shown to excel in tasks like question answering, their usage on question generation is underexplored for general domains and at scale. This work presents Savaal, a scalable question-generation system that generates higher-order questions from documents, as well as a real-world system implementation for general use. Savaal accomplishes the following goals and objectives: (i) scalability, capable of generating hundreds of questions from any document (ii) depth of understanding, synthesizing higherorder concepts to test learners’ understanding of the material, and (iii) domain independence, generalizing broadly to any field. Rather than naively providing the entire document in context to an LLM, Savaal breaks down the process of generating questions into a three-stage pipeline. We demonstrate that Savaal outperforms the direct prompting baseline as evaluated by 76 human experts on 71 documents across conference papers and PhD dissertations. We additionally contribute a general system for serving Savaal in real-world scenarios. We demonstrate that our system is scalable, enabling fault-tolerant and horizontal scaling of each individual component in response to fluctuations in usage. Moreover, our architecture enables interactive usage from users and collaboration in groups, reflecting real-world organizations like classrooms or enterprises. We hope that the system enables scalable question generation for educational and corporate use-cases.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology