Retrieval-Augmented Generation for Large Language Models: Enhancing Applied Economic Reasoning and Forecasting
Author(s)
Quintero, Sebastian
DownloadThesis PDF (24.81Mb)
Advisor
Kim, In Song
Terms of use
Metadata
Show full item recordAbstract
Large Language Models (LLMs) have shown promise in applied economics, particularly in interpretive reasoning and unstructured data analysis. However, their reliance on static pre-training data limits their effectiveness in dynamic economic environments, where factual grounding and real-time adaptability are crucial. This thesis investigates the role of Retrieval-Augmented Generation (RAG) as a scalable solution to these limitations by injecting external knowledge at inference time. RAG enables the integration of dynamic, unstructured data sources to refine predictions and contextualize outputs based on real-time or domain-specific information. This adaptability allows LLMs to tailor responses to current observations while enhancing factual reliability. We evaluate two core domains: (1) applied economic reasoning using a custom multiple-choice test bank on international trade, and (2) time series forecasting of macroeconomic indicators using textual embeddings derived from official U.S. economic reports. On the reasoning task, RAG consistently improves model accuracy across multiple prompting methods and question formats. Notably, GPT-4o achieves a 3.59% gain in Natural Question (NQ) accuracy when RAG is applied with Chain-of-Thought prompting, while Gemini 2.0 Flash sees improvements primarily under In-Context Learning. Hypothetical Document Embedding (HyDE) further enhances performance for GPT-4o, yielding the highest accuracy across all configurations. In the forecasting domain, we augment Time Series Language Models (TSLMs) with sentiment vectors derived from FinBERT, which classifies each economic report into a probability distribution over positive, neutral, and negative tone. These embeddings are averaged over a rolling set of retrieved documents and passed to the model as exogenous inputs. Across four state-of-the-art TSLMs, Moirai demonstrates the most consistent gains when augmented with RAG, outperforming its zero-shot baseline in relative root mean squared error (RRMSE) under optimized retrieval conditions. Fine-tuning experiments suggest that retrieval depth (k) has limited impact on short-term forecasts but can meaningfully affect performance at longer horizons. These findings underscore the effectiveness of RAG as a modular enhancement to both reasoning and forecasting with LLMs, and position document-grounded architectures like Moirai as particularly well-suited for hybrid tasks involving structured and unstructured economic data.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology