MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Scaling Automatic Question Generation to Large Documents: A Concept-Driven Approach

Author(s)
Noorbakhsh, Kimia
Thumbnail
DownloadThesis PDF (2.382Mb)
Advisor
Balakrishnan, Hari
Alizadeh, Mohammad
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Assessing and enhancing human learning through question-answering is vital, especially when dealing with large documents, yet automating this process remains challenging. While large language models (LLMs) excel at summarization and answering queries, their ability to generate meaningful questions from lengthy texts remains underexplored. We propose Savaal, a scalable question-generation system with three objectives: (i) scalability, enabling question-generation from hundreds of pages of text (ii) depth of understanding, producing questions beyond factual recall to test conceptual reasoning, and (iii) domainindependence, automatically generating questions across diverse knowledge areas. Instead of providing an LLM with large documents as context, Savaal improves results with a threestage processing pipeline. Our evaluation with 76 human experts on 71 papers and PhD dissertations shows that Savaal generates questions that better test depth of understanding by 6.5× for dissertations and 1.5× for papers compared to a direct-prompting LLM baseline. Notably, as document length increases, Savaal’s advantages in higher question quality and lower cost become more pronounced.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/163722
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.