Detecting Novel Associations in Large Data Sets
Author(s)
Reshef, David N.; Reshef, Yakir; Grossman, Sharon Rachel; Finucane, Hilary Kiyo; McVean, Gilean; Turnbaugh, Peter J.; Mitzenmacher, Michael; Sabeti, Pardis C.; Lander, Eric Steven; ... Show more Show less
DownloadLander_Detecting novel.pdf (1.624Mb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R[superscript 2]) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Date issued
2011-12Department
Whitaker College of Health Sciences and Technology; Massachusetts Institute of Technology. Department of Biology; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Science
Publisher
American Association for the Advancement of Science (AAAS)
Citation
Reshef, D. N., Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. “Detecting Novel Associations in Large Data Sets.” Science 334, no. 6062 (December 15, 2011): 1518-1524.
Version: Author's final manuscript
ISSN
0036-8075
1095-9203