Leveraging Large Language Models (LLMs) for Automated Extraction and Processing of Complex Ordering Forms
Author(s)
Daqqah, Bilal H.
DownloadThesis PDF (4.593Mb)
Advisor
Gupta, Amar
Terms of use
Metadata
Show full item recordAbstract
Data extraction from business documents is a critical but under-exploited area capable of unlocking significant value from vast document archives. Traditional methods relying on manual intervention or outsourcing are inefficient, error-prone, and costly, and commercial Deep Learning-based and OCR solutions still struggle with highly unstructured documents. This thesis explores the use of Large Language Models (LLMs) to automate the extraction and processing of ordering forms and procurement documents in collaboration with SiliconExperts. These documents contain complex codes used in electronic component procurement, which guide the manufacture and specification of parts. We developed an end-to-end pipeline comprising four key modules: Page Classification, OCR and Table Extraction, LLM Inference, and Code Combination Generation. Two approaches for key-value extraction were compared: one-shot prompting with in-context learning using GPT-4 Turbo with Vision (GPT-4V) and a fine-tuned GPT-3.5 model, in which the GPT-4V approach demonstrated superior performance. The pipeline effectively generated correct code combinations with high accuracy, although data quality issues impacted precision and performance. This research highlights the potential of LLMs to transform document processing workflows, bridging the gap between academic advancements and practical business applications.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology