An Artificial Intelligence Based Approach to Automate Document Processing in Business Area
Author(s)
Chen, Ta Hang
DownloadThesis PDF (3.439Mb)
Advisor
Gupta, Amar
Szolovits, Peter
Rhodes, Donna H.
Terms of use
Metadata
Show full item recordAbstract
Automatic document processing is always a strategy for business executives to improve operational efficiency. With Optical Character Recognition (OCR) and machine learning techniques, businesses are able to apply Artificial Intelligence (AI) to automate the process. However, introducing an AI application to business is challenging; it is easy to fail because of the complexity between the technical and organizational components. This thesis considers document processing from a sociotechnical system perspective and leverages a four-step system analysis approach to identify the critical components.
This research also proposes a machine learning model using Support Vector Machine (SVM) as the classifier and Word2vec embeddings as document features to classify business documents. The proposed model reaches a 0.872 Macro F1-score using scanned business documents from the RVL-CDIP dataset. The proposed model outperforms the other commonly used rule-based algorithms, RIPPER and PART, showing that the proposed model is potentially suitable to be deployed into business to classify the
documents.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; System Design and Management Program.Publisher
Massachusetts Institute of Technology