Show simple item record

dc.contributor.advisorGupta, Amar
dc.contributor.authorDaqqah, Bilal H.
dc.date.accessioned2024-09-03T21:08:02Z
dc.date.available2024-09-03T21:08:02Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:36:16.375Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156567
dc.description.abstractData extraction from business documents is a critical but under-exploited area capable of unlocking significant value from vast document archives. Traditional methods relying on manual intervention or outsourcing are inefficient, error-prone, and costly, and commercial Deep Learning-based and OCR solutions still struggle with highly unstructured documents. This thesis explores the use of Large Language Models (LLMs) to automate the extraction and processing of ordering forms and procurement documents in collaboration with SiliconExperts. These documents contain complex codes used in electronic component procurement, which guide the manufacture and specification of parts. We developed an end-to-end pipeline comprising four key modules: Page Classification, OCR and Table Extraction, LLM Inference, and Code Combination Generation. Two approaches for key-value extraction were compared: one-shot prompting with in-context learning using GPT-4 Turbo with Vision (GPT-4V) and a fine-tuned GPT-3.5 model, in which the GPT-4V approach demonstrated superior performance. The pipeline effectively generated correct code combinations with high accuracy, although data quality issues impacted precision and performance. This research highlights the potential of LLMs to transform document processing workflows, bridging the gap between academic advancements and practical business applications.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleLeveraging Large Language Models (LLMs) for Automated Extraction and Processing of Complex Ordering Forms
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record