The Impact of Internal Variability on Benchmarking Deep Learning Climate Emulators

Lütjens, Björn; Ferrari, Raffaele; Watson‐Parris, Duncan; Selin, Noelle E

dc.contributor.author	Lütjens, Björn
dc.contributor.author	Ferrari, Raffaele
dc.contributor.author	Watson‐Parris, Duncan
dc.contributor.author	Selin, Noelle E
dc.date.accessioned	2025-11-07T15:37:32Z
dc.date.available	2025-11-07T15:37:32Z
dc.date.issued	2025-08-26
dc.identifier.uri	https://hdl.handle.net/1721.1/163594
dc.description.abstract	Full-complexity Earth system models (ESMs) are computationally very expensive, limiting their use in exploring the climate outcomes of multiple emission pathways. More efficient emulators that approximate ESMs can directly map emissions onto climate outcomes, and benchmarks are being used to evaluate their accuracy on standardized tasks and data sets. We investigate a popular benchmark in data-driven climate emulation, ClimateBench, on which deep learning-based emulators are currently achieving the best performance. We compare these deep learning emulators with a linear regression-based emulator, akin to pattern scaling, and show that it outperforms the incumbent 100M-parameter deep learning foundation model, ClimaX, on 3 out of 4 regionally resolved climate variables, notably surface temperature and precipitation. While emulating surface temperature is expected to be predominantly linear, this result is surprising for emulating precipitation. Precipitation is a much more noisy variable, and we show that deep learning emulators can overfit to internal variability noise at low frequencies, degrading their performance in comparison to a linear emulator. We address the issue of overfitting by increasing the number of climate simulations per emission pathway (from 3 to 50) and updating the benchmark targets with the respective ensemble averages from the MPI-ESM1.2-LR model. Using the new targets, we show that linear pattern scaling continues to be more accurate on temperature, but can be outperformed by a deep learning-based technique for emulating precipitation. We publish our code and data at https://github.com/blutjens/climate-emulator.	en_US
dc.language.iso	en
dc.publisher	Wiley	en_US
dc.relation.isversionof	https://doi.org/10.1029/2024MS004619	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Wiley	en_US
dc.title	The Impact of Internal Variability on Benchmarking Deep Learning Climate Emulators	en_US
dc.type	Article	en_US
dc.identifier.citation	Lütjens, B., Ferrari, R., Watson-Parris, D., & Selin, N. E. (2025). The impact of internal variability on benchmarking deep learning climate emulators. Journal of Advances in Modeling Earth Systems, 17, e2024MS004619.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Earth, Atmospheric, and Planetary Sciences	en_US
dc.contributor.department	MIT Institute for Data, Systems, and Society	en_US
dc.relation.journal	Journal of Advances in Modeling Earth Systems	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2025-11-07T15:28:06Z
dspace.orderedauthors	Lütjens, B; Ferrari, R; Watson‐Parris, D; Selin, NE	en_US
dspace.date.submission	2025-11-07T15:28:11Z
mit.journal.volume	17	en_US
mit.journal.issue	8	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: J Adv Model Earth Syst - 2025 - ...
Size:: 7.480Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record