Gene expression prediction using low-rank matrix completion

Kapur, Arnav; Marwah, Kshitij; Alterovitz, Gil

dc.contributor.author	Kapur, Arnav
dc.contributor.author	Marwah, Kshitij
dc.contributor.author	Alterovitz, Gil
dc.date.accessioned	2016-08-26T18:57:23Z
dc.date.available	2016-08-26T18:57:23Z
dc.date.issued	2016-06
dc.date.submitted	2015-11
dc.identifier.issn	1471-2105
dc.identifier.uri	http://hdl.handle.net/1721.1/104048
dc.description.abstract	Background An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets. Results We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis. Conclusion This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences.	en_US
dc.publisher	BioMed Central	en_US
dc.relation.isversionof	http://dx.doi.org/10.1186/s12859-016-1106-6	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en_US
dc.source	BioMed Central	en_US
dc.title	Gene expression prediction using low-rank matrix completion	en_US
dc.type	Article	en_US
dc.identifier.citation	Kapur, Arnav, Kshitij Marwah, and Gil Alterovitz. “Gene Expression Prediction Using Low-Rank Matrix Completion.” BMC Bioinformatics 17.1 (2016): n. pag.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.mitauthor	Alterovitz, Gil	en_US
dc.relation.journal	BMC Bioinformatics	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2016-08-03T08:14:04Z
dc.language.rfc3066	en
dc.rights.holder	Kapur et al.
dspace.orderedauthors	Kapur, Arnav; Marwah, Kshitij; Alterovitz, Gil	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-5952-9844
mit.license	PUBLISHER_CC	en_US
mit.metadata.status	Complete

Files in this item

Name:: 12859_2016_Article_1106.pdf
Size:: 1.104Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record