Clinical trend discovery and analysis of Taiwanese health insurance claims data
Author(s)Pillai, Divya P
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Data-driven analysis can improve our understanding of medicine, and data from electronic health records and labs has been used successfully in predictive tasks. Less advanced analysis has been done on health insurance claims data, which can be rich and more structured but large in scale. Taiwan has had nationalized health insurance for twenty years; its National Health Research Institute Database (NHIRD) contains records of insurance claims, including medications, prescriptions, and treatment costs for both inpatient and outpatient visits, spanning sixteen years and a million patients. The NHIRD enables longitudinal studies of a patient's medical progression as well as aggregation and generalization to population-level insights. We conducted preliminary exploration of data trends in aggregate, such as diagnosis code frequency and average treatment cost over time. An infrastructure to perform large-scale queries and handle results was required to effectively use the NHIRD for research applications. After indexing database tables to improve query performance, we created a pipeline in Python to connect to and query the database, analyze data for hypothesis discovery and hypothesis testing, convert Taiwanese codes to international standards, and produce plots and graphs. This pipeline was used to examine drug side effects and comorbidities observed across a population, accounting for demographic variables. We also studied patient-specific longitudinal matrices of medical events, which were highly sparse. We attempted quantitative imputation methods to densify these matrices, but because the data was binary (indicating the presence of an event at a given time), categorical, and irregular, advanced imputation offered limited benefit. Nevertheless, we discovered interesting patterns in cohorts of diabetes patients treated with various classes of drugs. This information can be exploited in computational phenotyping and other learning methods, and combined with other data sources it could increase accuracy of clinical predictive tasks.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 61-62).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.