Detecting Errors in Financial Data: A Multi-Agent LLM and Synthetic Data Approach
Author(s)
Liu, Katherine
DownloadThesis PDF (1.299Mb)
Advisor
Gupta, Amar
Terms of use
Metadata
Show full item recordAbstract
With the high volume of activity flowing through financial institutions, detecting potential errors remains a critical challenge. This paper addresses two key areas where errors may occur: business name registrations and transactions within valid accounts. Traditional string-matching methods struggle to accurately identify incorrectly written business names that closely resemble existing ones, while existing error detection models for transaction data often suffer from class imbalance, leading to reduced performance on minority incorrect transaction cases. To address these issues, this paper proposes two novel approaches. First, a hybrid method integrating multi-agent Large Language Models (LLMs) with existing string-matching techniques enhances the detection of incorrect business names by capturing subtle variations beyond conventional edit-distance metrics, improving the recall from 0.815 for the baseline model to 0.987 using the proposed method. Second, an improved tabular data generation method for credit card transactions is introduced, leveraging LLMs and class balancing to generate high-quality synthetic data. Using this data to train error detection systems results in a decrease of the false negative rate from 23.47% to 12.84%. Together, these methods enhance the performance of error detection systems, enabling financial institutions to enhance the experiences of their clients.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology