Show simple item record

dc.contributor.advisorGupta, Amar
dc.contributor.authorLee, Samuel S.
dc.date.accessioned2024-09-16T13:46:04Z
dc.date.available2024-09-16T13:46:04Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:36:20.728Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156738
dc.description.abstractAs the number of documents processed by businesses across the world increases daily, the demand for streamlined and automated document processing methods grows. However, commercial methods for information extraction from documents do not generalize well across different document formats, as each solution is tailored to specific types of documents. This thesis provides an overview of a hybrid document processing pipeline designed to extract key-value pairs from technical specification documents with high accuracy. Two different phases of the pipeline are introduced, both employing rule-based methods and machine learning to cover a variety of document types. The first is an earlier iteration that extracts information from a simpler collection of documents, and the second is the current iteration designed to handle a much larger dataset containing more complex documents. Lastly, the initial stages of a module designed for key-value extraction from a specific type of technical specification document is also proposed.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleA Hybrid Approach for Key-Value Extraction from Technical Specification Documents
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record