Multimodal Data Fusion for Estimating Electricity Access and Demand
Author(s)
Lee, Stephen J.
DownloadThesis PDF (17.15Mb)
Advisor
Pérez-Arriaga, Ignacio J.
Fisher III, John W.
Terms of use
Metadata
Show full item recordAbstract
Electric power is a key enabler for economic development; nevertheless, 770 million
people live without electricity access and 3.5 billion have unreliable connections.
There is general consensus that the global community is off-track from realizing the
United Nation’s Sustainable Development Goal #7 (SDG7) target of “universal access
to affordable, reliable and modern energy services” by the year 2030. Under the International
Energy Agency’s (IEA) central “Stated Policies Scenario,” 670 million people
are expected to still be without electricity access in 2030.
Simultaneously, we as a global community are off-track from achieving the Paris
Agreement ambitions to limit global warming to 1.5 degrees Celsius compared to preindustrial
levels. A 2021 U.N. report notes that national mitigation pledges for 2030 will
collectively produce only one-seventh of the emissions reductions necessary to achieve
the 1.5 degree goal. While electricity and heat together comprise 31.9% of all greenhouse
gas (GHG) emissions globally, the electric power sector is expected to play a
significant role in virtually all credible pathways towards climate stabilization: power
sector emissions must be cut to near-zero by mid-century, and the power sector must
also expand to electrify and therefore decarbonize a larger share of total energy use.
The IEA’s “Net Zero by 2050” roadmap for net zero emissions models that electricity
demand for “emerging market and developing economies” will need to exceed double
the electricity demand in “advanced economies” by mid-century. Our development
and climate imperatives both rest upon electricity demand in low- and middle-income
counties.
This dissertation attempts to push the state-of-the-art with regards to understanding,
estimating, forecasting electricity demand in underserved contexts. We present
four technical chapters towards these ends.
First, we assess the importance of accurately estimating aggregate demand levels
by performing sensitivity analyses using technoeconomic optimization models. We find
that efforts to improve methods for demand forecasting are essential to prospects for right-sizing system designs. Over the domain of aggregate demand values modeled, the
average cost of service provision range from $0.13/kWh to $0.37/kWh. This nearly
three-fold difference demonstrates the critical influence of economies of scale and improved
grid utilization on cost. We additionally find that characterizing building-level
consumer type diversity plays a critical role in the outcome of high-resolution infrastructure
plans. For our ‘central demand case,” we show that modeling a diversity of
consumer types results in least-cost plans that are 9% less costly than modeling assuming
demand assuming there is only one customer type. When comparing supply
technology shares for cost-optimal designs, modeling consumer type diversity demand
decreases prescribed grid extension shares from 89% to 77%.
In our second technical chapter, we employ machine learning systems for probabilistic
data fusion to the problem of forecasting annual electricity demand at the countrylevel
for all African countries. We provide a novel set of probabilistic forecasts for the
continent while addressing missing data issues and employing a rigorous framework for
cross-validation and backtesting model results.
In our third technical chapter, we show how machine learning systems for probabilistic
data fusion can be used for estimating electricity access rates at building-level
resolutions in low-access countries. Estimating electricity access is a key component to
understanding electricity demand because aggregated consumption statistics only reflect
demand from buildings with electricity access. Without access information, there
is significant ambiguity when attempting to attribute aggregated consumption values to
individual buildings. We train and evaluate our model using data describing electrified
and non-electrified buildings in Rwanda and we achieve state-of-the-art results relative
to existing methods in the literature. For our test set in Rwanda, our method achieves
an accuracy score of 80.7% while the closest published baseline in the literature achieves
70.9%. Our system additionally enables explicit uncertainty quantification and has the
potential to be scaled across the whole African continent.
In our final technical chapter, we develop novel methods for estimating buildinglevel
electricity demand. Challengingly, ground truth metered consumption datasets
in low-access countries are often only accompanied by noisy geolocation data. This
issue is exacerbated by the fact that meter and building connections reflect many-tomany
relationships. There may be many electricity meters residing within a single
building, and there may also be many buildings that are connected to a single meter.
While our consumption data is logged at the meter-level, machine learning features of
interest can only be extracted at the building-level. Because standard supervised machine
learning models cannot express this complexity, we develop an application-tailored
model based on a neural network (NN)-embedded probabilistic graphical model (PGM)
for probabilistic data fusion. The PGM-based approach allows us to explicitly define
potential relationships between meters and nearby buildings while the NN models employed
enable us to effectively to extract information from multimodal features at the
building-level. As a result, our model reflects a principled approach to training and
running building-level demand estimation models using only meter-level ground truth information. We also make a few additional contributions: we show novelty by providing
probabilistic building-level output; training and testing in Rwanda, a country
for which building-level estimates are not currently available; and provide demand estimates
for commercial and industrial consumers in addition to residential consumers.
From a methodological standpoint, ours is the first machine learning model that embeds
and trains NNs within PGMs employing Markov chain Monte Carlo (MCMC)
sampling algorithms for inference. This application serves as an example for the novel
combination of these individually important classes of algorithms.
Taken together, the methods and studies presented in this dissertation enable the
improved deployment of continuous electricity infrastructure planning across all lowand
middle-income countries worldwide. We hope the research community continues to
catalyze progress towards enabling continuous planning methodologies and map efficient
pathways for achieving our global climate and development goals.
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology