Show simple item record

dc.contributor.authorSundsøy, Pål
dc.contributor.authorBjelland, Johannes
dc.contributor.authorBengtsson, Linus
dc.contributor.authorde Montjoye, Yves-Alexandre
dc.contributor.authorJahani, Eaman
dc.contributor.authorPentland, Alex Paul
dc.date.accessioned2017-05-17T15:09:55Z
dc.date.available2017-05-17T15:09:55Z
dc.date.issued2017-05
dc.date.submitted2016-11
dc.identifier.issn2193-1127
dc.identifier.urihttp://hdl.handle.net/1721.1/109143
dc.description.abstractMobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. Mobile phone data has a great potential for good and our framework allows this data to be augmented with vulnerability and other information at a fraction of the cost.en_US
dc.publisherSpringeren_US
dc.relation.isversionofhttp://dx.doi.org/10.1140/epjds/s13688-017-0099-3en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer Berlin Heidelbergen_US
dc.titleImproving official statistics in emerging markets using machine learning and mobile phone dataen_US
dc.typeArticleen_US
dc.identifier.citationJahani, Eaman; Sundsøy, Pål; Bjelland, Johannes; Bengtsson, Linus; Pentland, Alex ‘Sandy’ and de Montjoye, Yves-Alexandre. "Improving official statistics in emerging markets using machine learning and mobile phone data." EPJ Data Science 6, no. 3 (May 2017): 1-21. © 2017 The Author(s)en_US
dc.contributor.departmentMassachusetts Institute of Technology. Institute for Data, Systems, and Societyen_US
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.contributor.mitauthorJahani, Eaman
dc.contributor.mitauthorPentland, Alex Paul
dc.relation.journalEPJ Data Scienceen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2017-05-17T04:54:17Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.orderedauthorsJahani, Eaman; Sundsøy, Pål; Bjelland, Johannes; Bengtsson, Linus; Pentland, Alex ‘Sandy’; de Montjoye, Yves-Alexandreen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-3879-4275
dc.identifier.orcidhttps://orcid.org/0000-0002-8053-9983
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record