Exploiting ontology graph for predicting sparsely annotated gene function

Wang, Sheng; Cho, Hyunghoon; Zhai, ChengXiang; Berger, Bonnie; Peng, Jian

dc.contributor.author	Wang, Sheng
dc.contributor.author	Zhai, ChengXiang
dc.contributor.author	Peng, Jian
dc.contributor.author	Cho, Hyunghoon
dc.contributor.author	Berger Leighton, Bonnie
dc.date.accessioned	2016-10-13T17:54:36Z
dc.date.available	2016-10-13T17:54:36Z
dc.date.issued	2015-06
dc.identifier.issn	1367-4803
dc.identifier.issn	1460-2059
dc.identifier.uri	http://hdl.handle.net/1721.1/104798
dc.description.abstract	Motivation: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this ‘overfitting’ issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. Results: We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions.	en_US
dc.description.sponsorship	National Institute of General Medical Sciences (U.S.) (Grant 1U54GM114838)	en_US
dc.language.iso	en_US
dc.publisher	Oxford University Press	en_US
dc.relation.isversionof	http://dx.doi.org/10.1093/bioinformatics/btv260	en_US
dc.rights	Creative Commons Attribution-NonCommercial 4.0 International	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	en_US
dc.source	Oxford University Press	en_US
dc.title	Exploiting ontology graph for predicting sparsely annotated gene function	en_US
dc.type	Article	en_US
dc.identifier.citation	Wang, Sheng, Hyunghoon Cho, ChengXiang Zhai, Bonnie Berger, and Jian Peng. “Exploiting Ontology Graph for Predicting Sparsely Annotated Gene Function.” Bioinformatics 31, no. 12 (June 13, 2015): i357–i364.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Mathematics	en_US
dc.contributor.mitauthor	Cho, Hyunghoon
dc.contributor.mitauthor	Berger Leighton, Bonnie
dc.relation.journal	Bioinformatics	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Wang, Sheng; Cho, Hyunghoon; Zhai, ChengXiang; Berger, Bonnie; Peng, Jian	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-2713-0150
dc.identifier.orcid	https://orcid.org/0000-0002-2724-7228
mit.license	PUBLISHER_CC	en_US

Files in this item

Name:: Berger_Exploiting ontology.pdf
Size:: 638.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record