Synthetic Multi-Fidelity Climate Modeling Dataset for Physics-Informed Deep Learning (SMCD-PIDL)

Published: 8 May 2026| Version 1 | DOI: 10.17632/z8n3gc397h.1
Contributor:
narender Palugula

Description

This dataset was generated for the study entitled “A Scalable Hybrid Physics-Guided Multi-Fidelity Learning Framework for High-Accuracy and Uncertainty-Aware Climate Anomaly Prediction under Sparse and Heterogeneous Data Conditions.” The dataset represents synthetic climate and environmental observations designed to simulate sparse, heterogeneous, and multi-fidelity data conditions for climate anomaly prediction. It includes atmospheric, environmental, spatial, temporal, and uncertainty-related variables used for training, validation, and performance evaluation of the proposed physics-guided learning framework. The dataset was created to support experimental analysis where real-world climate observations may be incomplete, unevenly distributed, or affected by uncertainty. It can be used to evaluate climate anomaly classification, uncertainty-aware prediction, and hybrid physics-guided machine learning models.

Files

Steps to reproduce

The dataset can be reproduced by generating synthetic climate and environmental variables based on the methodology described in the associated manuscript. First, define the input variables representing atmospheric, spatial, temporal, environmental, and uncertainty-related conditions. Next, generate synthetic samples within realistic value ranges for each variable to simulate sparse and heterogeneous climate observations. Missingness and uncertainty are introduced to represent incomplete and noisy climate data conditions. The target anomaly class is then assigned based on the combined influence of selected climate and environmental indicators. Finally, the generated dataset is used for model training, validation, and testing of the proposed physics-guided multi-fidelity learning framework.

Institutions

Categories

Environmental Science

Licence