MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Development of an End-to-End Pipeline for Custom Key-Value Extraction from Commercial Invoices

Author(s)
Mohan, Abhishek
Thumbnail
DownloadThesis PDF (4.035Mb)
Advisor
Gupta, Amar
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Inefficiencies in manual extraction of information from business documents have resulted in the development of automated processing solutions. Within the scope of business documents, commercial invoices present additional complexities due to the diversity of document layouts and the variation in quality of scanned documents. Commercially available solutions have been built to perform invoice extraction, yet they do not provide flexibility in accomplishing tasks unique to a particular dataset and its associated complications. Using sample documents provided by a leading electronic component distributor, we researched different approaches capable of extracting key-value information from a complex dataset of invoices. The thesis provides a detailed look into the development of a highly accurate, end-to-end data pipeline accomplishing this task. A multi-module approach integrating image processing, optical character recognition, custom algorithms, and machine learning-based matching was built and compartmentalized into continuous stages - allowing for effective and efficient key-value extraction of information from invoice documents. In conjunction with an intuitive web interface, the custom pipeline provides a solution with strong performance and the flexibility to be generalized for extraction of additional business documents in future efforts.
Date issued
2023-02
URI
https://hdl.handle.net/1721.1/150308
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.