Deconstructing complex diseases : identification of new phenotypical sub-clusters of Type 2 diabetes using machine learning
Identification of new phenotypical sub-clusters of Type 2 diabetes using machine learning
Massachusetts Institute of Technology. Engineering and Management Program.
System Design and Management Program.
Steven J. Spear.
MetadataShow full item record
Advances in data science and technology promise to help clinicians diagnose and treat certain conditions. But there are other complex and poorly characterized illnesses for which the drivers and dependent variables are not understood well enough to take full advantage of the copious patient data that may exist. For these diseases new techniques need to be explored to gain better understanding of the nature of the disease, its subtypes, cause, consequence, and presentation. Modern genetics have shown that these diseases often have multiple subtypes, as well as multiple phenotypes as indicated by the new laboratory data. Examples of such diseases include common and important illness such as Type 2 diabetes (T2D) - affecting approximately 30 million Americans, Crohn's Disease - 1 million USA suffers, epilepsy - 3.4 million Americans, and migraines - another 3.2 million in the United States.Our research explores how machine learning (ML) can be applied to these less well understood complex diseases to improve clinical translation and management. This thesis will discuss how unsupervised machine learning techniques can be used for complex phenotype clustering to identify sub-types of T2D for better clinical management and treatment. T2D is a complex heterogenous disease affecting the world's population at rapidly increasing rates. While clinicians now better understand the heterogeneity of the disease, T2D treatment strategies still remain largely based on populations rather than on a specific patient's subtype. This thesis explores the concept of using data analytics and ML to identify sub-types of T2D as the first step in moving towards precision medicine & treatments.This thesis includes (a) characterization of T2D as a heterogenous disease, (b) existing research attempts to dissect the disease into sub-types based on phenotypes and gene expressions, and their limitations, (c) phenotype clustering analysis on T2D patients using unsupervised machine learning techniques and MIMIC III database, and (d) analysis of the clusters/subgroups in different ways to understand their clinical significance. With multiple iterations of the clustering experiment, this thesis, (a) provides a good test of concept for sub-classification of T2D patients using unsupervised machine learning techniques such as, clustering and dimension reduction, (b) establishes a data pipeline and clustering model framework to be applied to richer datasets, (c) suggests various experiment design options for further analysis, and (d) establishes a direction for future work including advanced modelling techniques and predictive analytics for complex diseases.
Thesis: S.M. in Engineering and Management, Massachusetts Institute of Technology, System Design and Management Program, 2019Cataloged from PDF version of thesis.Includes bibliographical references (pages 62-64).
DepartmentMassachusetts Institute of Technology. Engineering and Management Program; System Design and Management Program
Massachusetts Institute of Technology
Engineering and Management Program., System Design and Management Program.