Show simple item record

dc.contributor.advisorTommi S. Jaakkola and Hui Ge.en_US
dc.contributor.authorMissiuro, Patrycja Vasilyev, 1976-en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2010-08-26T15:21:02Z
dc.date.available2010-08-26T15:21:02Z
dc.date.copyright2010en_US
dc.date.issued2010en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/57541
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 191-204).en_US
dc.description.abstractThe presented work develops a set of machine learning and other computational techniques to investigate and predict gene properties across a variety of biological datasets. In particular, our main goal is the discovery of genetic interactions based on sparse and incomplete information. In our development, we use gene data from two model organisms, Caenorhabditis elegans and Saccharomyces cerevisiae. Our first method, information flow, uses circuit theory to evaluate the importance of a protein in an interactome. We find that proteins with high i-flow scores mediate information exchange between functional modules. We also show that increasing information flow scores strongly correlate with the likelihood of observing lethality or pleiotropy as well as observing genetic interactions. Our metric significantly outperforms other established network metrics such as degree or betweenness. Next, we show how Bayesian sets can be applied to gain intuition as to which datasets are the most relevant for predicting genetic interactions. In order to directly apply this method to microarray data, we extend Bayesian sets to handle continuous variables. Using Bayesian sets, we show that genetically interacting genes tend to share phenotypes but are not necessarily co-localized. Additionally, they have similar development and aging temporal expression profiles. One of the major difficulties in dealing with biological data is the problem of incomplete datasets. We describe a novel application of collaborative filtering (CF) in order to predict missing values in the biological datasets.en_US
dc.description.abstract(cont.) We adapt the factorization-based and the neighborhood-aware CF [13] to deal with a mixture of continuous and discrete entries. We use collaborative filtering to input missing values, assess how much information relevant to genetic interactions is present, and, finally, to predict genetic interactions. We also show how CF can reduce input dimensionality. Our last development is the application of Support Vector Machines (SVM), an adapted machine learning classification method, to predicting genetic interactions. We find that SVM with nonlinear radial basis function (RBF) kernel has greater predictive power over CF. Its performance, however, greatly benefits from using CF to fill in missing entries in the input data. We show that SVM performance further improves if we constrain the group of genes to a specific functional category. Throughout this thesis, we emphasize the features of the studied datasets and explain our findings from a biological perspective. In this respect, we hope that this work possesses an independent biological significance. The final step would be to confirm our predictions experimentally. This would allow us to gain new insights into C. elegans biology: specific genes orchestrating developmental and regulatory pathways, response to stress, etc.en_US
dc.description.statementofresponsibilityby Patrycja Vasilyev Missiuro.en_US
dc.format.extent204 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titlePredicting genetic interactions in Caenorhabditis elegans using machine learningen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc635990963en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record