| dc.contributor.advisor | Gupta, Amar | |
| dc.contributor.author | Liu, Katherine | |
| dc.date.accessioned | 2025-10-06T17:36:51Z | |
| dc.date.available | 2025-10-06T17:36:51Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-06-23T14:02:53.715Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/162958 | |
| dc.description.abstract | With the high volume of activity flowing through financial institutions, detecting potential errors remains a critical challenge. This paper addresses two key areas where errors may occur: business name registrations and transactions within valid accounts. Traditional string-matching methods struggle to accurately identify incorrectly written business names that closely resemble existing ones, while existing error detection models for transaction data often suffer from class imbalance, leading to reduced performance on minority incorrect transaction cases. To address these issues, this paper proposes two novel approaches. First, a hybrid method integrating multi-agent Large Language Models (LLMs) with existing string-matching techniques enhances the detection of incorrect business names by capturing subtle variations beyond conventional edit-distance metrics, improving the recall from 0.815 for the baseline model to 0.987 using the proposed method. Second, an improved tabular data generation method for credit card transactions is introduced, leveraging LLMs and class balancing to generate high-quality synthetic data. Using this data to train error detection systems results in a decrease of the false negative rate from 23.47% to 12.84%. Together, these methods enhance the performance of error detection systems, enabling financial institutions to enhance the experiences of their clients. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.title | Detecting Errors in Financial Data: A Multi-Agent LLM and Synthetic Data Approach | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.description.degree | S.B. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| dc.identifier.orcid | https://orcid.org/0009-0003-6686-2228 | |
| mit.thesis.degree | Master | |
| mit.thesis.degree | Bachelor | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |
| thesis.degree.name | Bachelor of Science in Computer Science and Engineering | |