Integrative analysis of heterogeneous genomic datasets to discover genetic etiology of autism spectrum disorders
Author(s)
Nazeen, Sumaiya
DownloadFull printable version (7.343Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Bonnie A. Berger.
Terms of use
Metadata
Show full item recordAbstract
Understanding the genetic background of complex diseases is crucial to medical research, with implications to diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at predicting and prioritizing genes for a particular disease of interest. State-of-the-art gene prediction and prioritization methods rely on the observation that disease-causing genes have some sort of functional similarity based on either sequence, phenotype, protein-protein interaction (PPI) network, or functional annotation. Another increasingly accepted view is that human diseases result from perturbations of molecular networks, and genes causing the same or similar diseases tend to be close to one another in molecular networks. Such observations have built the basis for a large collection of computational approaches to find previously unknown genes associated with certain diseases. The majority of the methods are designed based on protein interactome networks, with integration of other large-scale omics data, to infer how likely it is that a gene is associated with a disease. In this thesis, we set out to address this outstanding challenge of understanding the genetic etiology of autism spectrum disorder (ASD), which refers to a group of complex neurodevelopmental disorders sharing the common feature of dysfunctional reciprocal social interaction. We introduce three novel methods for computing how likely a given gene is to be involved in ASDs based on copy number variations (CNVs), phenotype similarity, and protein interactome network topology. We also customize a random walk with restarts algorithm for ASD gene prioritization for the first time. Finally, we provide a novel integrative approach for combining CNV, phenotype similarity, and topology-related information with existing knowledge from literature. Our integrative approach outperforms the individual schemes in identifying and ranking ASD related genes. Our candidate gene set provides a number of interesting biological insights in that it is overrepresented in a number of interesting signaling, cell-adhesion and neurological pathways, molecular functions, and biological processes that are worth further investigation in connection with ASDs. We also find evidence for an interesting connection between gastrointestinal disorders, particularly inflammatory bowel diseases (IBD), and ASDs. The subnetworks we identify indicate the possibility of existence of subclasses of disorders along the autism spectrum.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. Cataloged from PDF version of thesis. Includes bibliographical references (pages 99-109).
Date issued
2014Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.