Show simple item record

dc.contributor.advisorHosoi, Anette E. "Peko"
dc.contributor.authorNeupane, Pragya
dc.date.accessioned2025-08-21T17:02:13Z
dc.date.available2025-08-21T17:02:13Z
dc.date.issued2025-05
dc.date.submitted2025-06-16T14:46:54.467Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162445
dc.description.abstractTables in scientific literature are rich sources of structured data, yet their complex and variable formats pose challenges for automated extraction. This thesis focuses on improving the reliability of Table Structure Recognition (TSR) using the Table Transformer (TATR) model, with a specific application to childhood obesity intervention studies. While fine-tuning TATR on a domain-specific dataset improves detection metrics, persistent errors such as overlapping rows and misclassified header columns remain. Through a systematic post-hoc error analysis of 175 scientific tables, we identify these dominant failure modes and develop lightweight post-processing modules: an overlap-aware row filtering algorithm and an OCR-enhanced column boundary correction method. Importantly, instead of relying on computationally expensive large language models (LLMs), this approach leverages efficient, interpretable techniques tailored to the domain-specific structure of public health tables. Our combined method reduces the proportion of structurally erroneous tables from 46.3% to an estimated 9.7–12.6%, improving the semantic alignment and interpretability of model outputs. This work contributes a transparent and scalable pipeline that enhances the trustworthiness of automated table extraction systems, with direct relevance to evidence-based decision-making in public health.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleAnalyzing Inconsistent Results of Table Transformer for Improved Data Extraction in Childhood Obesity Intervention Literature
dc.typeThesis
dc.description.degreeS.M.
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Institute for Data, Systems, and Society
dc.identifier.orcid0009-0000-2788-2793
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Technology and Policy
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record