Earth system science increasingly relies on machine learning to analyze complex, multivariate, and spatiotemporal data. However, the validity of these models critically depends on the assumption that training and deployment data share similar statistical properties – a condition often violated in real-world environmental applications. This presentation addresses the risks associated with non-stationary training data distributions, arising from climate change, evolving land use, or sensor shifts over time. We show how such distribution shifts can lead to degraded model performance, biased predictions, and misleading scientific conclusions. Through different examples, we illustrate the mechanisms and consequences of non-stationarity. We then discuss methodological solutions, including domain adaptation, continual learning, and uncertainty quantification techniques, that help mitigate these effects and improve model robustness. By combining insights from machine learning and earth system science, this talk aims to foster awareness of distributional risks and promote the development of adaptive, interpretable, and trustworthy models for understanding and predicting Earth’s dynamic systems.
Institutions