Retrieval-Augmented Generation for Large Language Models: Enhancing Applied Economic Reasoning and Forecasting

Quintero, Sebastian

Author(s)

Quintero, Sebastian

DownloadThesis PDF (24.81Mb)

Advisor

Kim, In Song

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Large Language Models (LLMs) have shown promise in applied economics, particularly in interpretive reasoning and unstructured data analysis. However, their reliance on static pre-training data limits their effectiveness in dynamic economic environments, where factual grounding and real-time adaptability are crucial. This thesis investigates the role of Retrieval-Augmented Generation (RAG) as a scalable solution to these limitations by injecting external knowledge at inference time. RAG enables the integration of dynamic, unstructured data sources to refine predictions and contextualize outputs based on real-time or domain-specific information. This adaptability allows LLMs to tailor responses to current observations while enhancing factual reliability. We evaluate two core domains: (1) applied economic reasoning using a custom multiple-choice test bank on international trade, and (2) time series forecasting of macroeconomic indicators using textual embeddings derived from official U.S. economic reports. On the reasoning task, RAG consistently improves model accuracy across multiple prompting methods and question formats. Notably, GPT-4o achieves a 3.59% gain in Natural Question (NQ) accuracy when RAG is applied with Chain-of-Thought prompting, while Gemini 2.0 Flash sees improvements primarily under In-Context Learning. Hypothetical Document Embedding (HyDE) further enhances performance for GPT-4o, yielding the highest accuracy across all configurations. In the forecasting domain, we augment Time Series Language Models (TSLMs) with sentiment vectors derived from FinBERT, which classifies each economic report into a probability distribution over positive, neutral, and negative tone. These embeddings are averaged over a rolling set of retrieved documents and passed to the model as exogenous inputs. Across four state-of-the-art TSLMs, Moirai demonstrates the most consistent gains when augmented with RAG, outperforming its zero-shot baseline in relative root mean squared error (RRMSE) under optimized retrieval conditions. Fine-tuning experiments suggest that retrieval depth (k) has limited impact on short-term forecasts but can meaningfully affect performance at longer horizons. These findings underscore the effectiveness of RAG as a modular enhancement to both reasoning and forecasting with LLMs, and position document-grounded architectures like Moirai as particularly well-suited for hybrid tasks involving structured and unstructured economic data.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162069

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses