Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge
Author(s)
Rigobon, Daniel E; Jahani, Eaman; Suhara, Yoshihiko; AlGhoneim, Khaled; Alghunaim, Abdulaziz; Pentland, Alex Sandy; Almaatouq, Abdullah; ... Show more Show less
DownloadPublished version (943.7Kb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
<jats:p> In this article, the authors discuss and analyze their approach to the Fragile Families Challenge. The data consisted of more than 12,000 features (covariates) about the children and their parents, schools, and overall environments from birth to age 9. The authors’ modular and collaborative approach parallelized prediction tasks and relied primarily on existing data science techniques, including (1) data preprocessing: elimination of low variance features, imputation of missing data, and construction of composite features; (2) feature selection through univariate mutual information and extraction of nonzero least absolute shrinkage and selection operator coefficients; (3) three machine learning models: random forest, elastic net, and gradient-boosted trees; and finally (4) prediction aggregation according to performance. The top-performing submissions produced winning out-of-sample predictions for three outcomes: grade point average, grit, and layoff. However, predictions were at most 20 percent better than a baseline that predicted the mean value of the training data for each outcome. </jats:p>
Date issued
2019Department
Massachusetts Institute of Technology. Department of Mechanical Engineering; MIT Connection Science (Research institute); Massachusetts Institute of Technology. Media Laboratory; Sloan School of ManagementJournal
Socius: Sociological Research for a Dynamic World
Publisher
SAGE Publications