MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Improving official statistics in emerging markets using machine learning and mobile phone data

Author(s)
Sundsøy, Pål; Bjelland, Johannes; Bengtsson, Linus; de Montjoye, Yves-Alexandre; Jahani, Eaman; Pentland, Alex Paul; ... Show more Show less
Thumbnail
Download13688_2017_Article_99.pdf (2.478Mb)
PUBLISHER_CC

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution http://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. Mobile phone data has a great potential for good and our framework allows this data to be augmented with vulnerability and other information at a fraction of the cost.
Date issued
2017-05
URI
http://hdl.handle.net/1721.1/109143
Department
Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Program in Media Arts and Sciences (Massachusetts Institute of Technology)
Journal
EPJ Data Science
Publisher
Springer
Citation
Jahani, Eaman; Sundsøy, Pål; Bjelland, Johannes; Bengtsson, Linus; Pentland, Alex ‘Sandy’ and de Montjoye, Yves-Alexandre. "Improving official statistics in emerging markets using machine learning and mobile phone data." EPJ Data Science 6, no. 3 (May 2017): 1-21. © 2017 The Author(s)
Version: Final published version
ISSN
2193-1127

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.