| dc.contributor.advisor | Hosoi, Anette E. "Peko" | |
| dc.contributor.author | Neupane, Pragya | |
| dc.date.accessioned | 2025-08-21T17:02:13Z | |
| dc.date.available | 2025-08-21T17:02:13Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-06-16T14:46:54.467Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/162445 | |
| dc.description.abstract | Tables in scientific literature are rich sources of structured data, yet their complex and variable formats pose challenges for automated extraction. This thesis focuses on improving the reliability of Table Structure Recognition (TSR) using the Table Transformer (TATR) model, with a specific application to childhood obesity intervention studies. While fine-tuning TATR on a domain-specific dataset improves detection metrics, persistent errors such as overlapping rows and misclassified header columns remain. Through a systematic post-hoc error analysis of 175 scientific tables, we identify these dominant failure modes and develop lightweight post-processing modules: an overlap-aware row filtering algorithm and an OCR-enhanced column boundary correction method. Importantly, instead of relying on computationally expensive large language models (LLMs), this approach leverages efficient, interpretable techniques tailored to the domain-specific structure of public health tables. Our combined method reduces the proportion of structurally erroneous tables from 46.3% to an estimated 9.7–12.6%, improving the semantic alignment and interpretability of model outputs. This work contributes a transparent and scalable pipeline that enhances the trustworthiness of automated table extraction systems, with direct relevance to evidence-based decision-making in public health. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Analyzing Inconsistent Results of Table Transformer for Improved
Data Extraction in Childhood Obesity Intervention Literature | |
| dc.type | Thesis | |
| dc.description.degree | S.M. | |
| dc.description.degree | S.M. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| dc.contributor.department | Massachusetts Institute of Technology. Institute for Data, Systems, and Society | |
| dc.identifier.orcid | 0009-0000-2788-2793 | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Science in Technology and Policy | |
| thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |