MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Hybrid Approach for Key-Value Extraction from Technical Specification Documents

Author(s)
Lee, Samuel S.
Thumbnail
DownloadThesis PDF (3.175Mb)
Advisor
Gupta, Amar
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
As the number of documents processed by businesses across the world increases daily, the demand for streamlined and automated document processing methods grows. However, commercial methods for information extraction from documents do not generalize well across different document formats, as each solution is tailored to specific types of documents. This thesis provides an overview of a hybrid document processing pipeline designed to extract key-value pairs from technical specification documents with high accuracy. Two different phases of the pipeline are introduced, both employing rule-based methods and machine learning to cover a variety of document types. The first is an earlier iteration that extracts information from a simpler collection of documents, and the second is the current iteration designed to handle a much larger dataset containing more complex documents. Lastly, the initial stages of a module designed for key-value extraction from a specific type of technical specification document is also proposed.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156738
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.