Model selection for extremal dependence structures using deep learning: Application to environmental data.
Description
This repository contains all the code, data, and resources used in our study: “Model Selection for Extremal Dependence Structures Using Deep Learning: Application to Environmental Data.” The goal of this research is to better understand and model the spatial dependence structure of extreme 2 m air temperatures across Iraq. We focus on selecting the most appropriate max-stable dependence structure using a deep learning approach. Our approach uses convolutional neural networks (CNNs) to learn spatial dependence patterns from datasets simulating the max-stable models fitted to 2m air temperature phenomena. The idea is to train the networks to recognize which model and covariance structure best fit the data. We propose two selection strategies: • Scheme 1: A single CNN (CNN-C) that predicts both the max-stable model and its covariance function at once. • Scheme 2: A two-stage approach, where one CNN (CNN-M) predicts the model family, and then a second CNN, selected from CNN-S (for Schlather), CNN-G (for Geometric), or CNN-E (for Extremal-t), determines the specific covariance function. To evaluate performance, we compare these CNN-based results to a classical model selection method: the Composite Likelihood Information Criterion (CLIC). We also validate our findings using a parametric bootstrap approach based on extremal coefficients. What the codes and dataset supports: • Simulating spatial dependence structures under different max-stable processes • Fitting models using composite likelihood and comparing them using CLIC • Training CNNs to classify spatial dependence structures • Validating model selection using extremal coefficient diagnostics • Comparing deep learning–based selection with traditional statistical methods, e.g., CLIC
Files
Steps to reproduce
To reproduce the results or adapt the workflow to your own data or study area, follow the steps below. 1. Start with R/01_main_workflow.R – This is the main script that runs the entire pipeline. It handles model fitting, simulation of dependence structure of max-stable processes computed by and pairwise concurrence probabilities, training of convolutional neural networks (CNNs), and the final evaluation of both statistical and deep learning-based model selection methods. 2. Fit and validate models – Use R/02_model_fitting.R to fit spatial max-stable models. Then, run R/05_model_validation.R to validate the selected models using extremal coefficients. This step helps assess how well the models capture spatial dependence across different distances. 3. Train and test CNNs – Use R/03_model_building_CNN.R to build and train CNNs under two schemes: a single CNN for joint model-covariance classification (CNN-C), and a hierarchical structure involving CNN-M followed by CNN-S, CNN-G, or CNN-E. Use R/04_CNN_testing_and_prediction.R to test the CNNs on new simulated fields and generate model selection predictions. Project Organization • R/: Contains all scripts for running the workflow: simulation, model fitting, CNN training/testing, validation routines, plotting, and utility functions • data/: Includes the ERA5 hourly 2 m temperature data (.nc) and all preprocessed .RData files used for training, validation, and comparison • cnn_models/: Stores pretrained CNN weights for all five classification networks (CNN-C, CNN-M, CNN-S, CNN-G, CNN-E) Requirements • R version ≥ 4.2.0 • Key R packages: SpatialExtremes, keras, tensorflow, ggplot2 • Deep learning components use the keras package linked with TensorFlow. For setup instructions, see: https://tensorflow.rstudio.com Data Source • ERA5 hourly 2 m temperature data for Iraq (NetCDF file: iraq_temperature_data.nc) • Spatial coordinates from 50 selected locations across Iraq • All model outputs, transformed data, and evaluation results are stored in .RData format for reproducibility
Institutions
- Universite Claude Bernard Lyon 1
- University of Mosul