MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Analyzing Inconsistent Results of Table Transformer for Improved Data Extraction in Childhood Obesity Intervention Literature

Author(s)
Neupane, Pragya
Thumbnail
DownloadThesis PDF (8.659Mb)
Advisor
Hosoi, Anette E. "Peko"
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Tables in scientific literature are rich sources of structured data, yet their complex and variable formats pose challenges for automated extraction. This thesis focuses on improving the reliability of Table Structure Recognition (TSR) using the Table Transformer (TATR) model, with a specific application to childhood obesity intervention studies. While fine-tuning TATR on a domain-specific dataset improves detection metrics, persistent errors such as overlapping rows and misclassified header columns remain. Through a systematic post-hoc error analysis of 175 scientific tables, we identify these dominant failure modes and develop lightweight post-processing modules: an overlap-aware row filtering algorithm and an OCR-enhanced column boundary correction method. Importantly, instead of relying on computationally expensive large language models (LLMs), this approach leverages efficient, interpretable techniques tailored to the domain-specific structure of public health tables. Our combined method reduces the proportion of structurally erroneous tables from 46.3% to an estimated 9.7–12.6%, improving the semantic alignment and interpretability of model outputs. This work contributes a transparent and scalable pipeline that enhances the trustworthiness of automated table extraction systems, with direct relevance to evidence-based decision-making in public health.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162445
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Institute for Data, Systems, and Society
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.