Show simple item record

dc.contributor.authorLoh, Po-Ru
dc.contributor.authorTucker, George Jay
dc.contributor.authorBerger, Bonnie
dc.date.accessioned2012-02-08T17:16:30Z
dc.date.available2012-02-08T17:16:30Z
dc.date.issued2011-12
dc.date.submitted2011-04
dc.identifier.issn1932-6203
dc.identifier.urihttp://hdl.handle.net/1721.1/69039
dc.description.abstractA major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.en_US
dc.description.sponsorshipNational Science Foundation (U.S.). Graduate Research Fellowship Programen_US
dc.description.sponsorshipNational Defense Science and Engineering Graduate Fellowshipen_US
dc.language.isoen_US
dc.publisherPublic Library of Scienceen_US
dc.relation.isversionofhttp://dx.doi.org/10.1371/journal.pone.0029095en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttp://creativecommons.org/licenses/by/2.5/en_US
dc.sourcePLoSen_US
dc.titlePhenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challengeen_US
dc.typeArticleen_US
dc.identifier.citationLoh, Po-Ru, George Tucker, and Bonnie Berger. “Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge.” Ed. Mark Isalan. PLoS ONE 6.12 (2011): e29095. Web. 8 Feb. 2012.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.approverBerger Leighton, Bonnie
dc.contributor.mitauthorLoh, Po-Ru
dc.contributor.mitauthorTucker, George Jay
dc.contributor.mitauthorBerger, Bonnie
dc.relation.journalPLoS ONEen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsLoh, Po-Ru; Tucker, George; Berger, Bonnieen
dc.identifier.orcidhttps://orcid.org/0000-0002-2724-7228
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record