Skip to main content

Ecological Informatics

ISSN: 1574-9541

Visit Journal website

Datasets associated with articles published in Ecological Informatics

Filter Results
1970
2024
1970 2024
15 results
  • SDesti: An R package for the analysis of aquatic benthos environmental studies’ data
    Data analysis is one of the most relevant steps of aquatic benthic environmental monitoring and research studies, and should be a fundamental consideration in both the planning (i.e., defining appropriate sampling design strategies) and implementation phases (application of appropriate standardized sampling procedures). A common objective of these studies is to identify relationships between environmental stressors and benthic bioindicator metrics. However, assessing these relationships is a complex process. Multivariate regression model adjustment coupled with forward and backward model selection routines is an appropriate complementary statistical analysis tool to test for the existence of statistically significant associations between a non-autocorrelated biological response and each variable within a group of environmental covariates included in a model. With this in mind, we developed SDesti, a user-friendly R package to analyze benthos data (number of individuals, biomass, chlorophyll concentration, or biological indices, excluding beta diversity metrics). SDesti contains four user accessible functions. AnalysisDescriptives() and Estimation() give information on the quality, homogeneity and representativeness of the data for one sampling campaign for one site. TimeLineAnalysisDescriptives() performs the descriptive analysis that usually precedes the adjustment of a regression model. TimeLineAnalysis() automatically adjusts an adequate regression model (linear, Poisson, quasipoisson, or negative binomial) and also returns the necessary measures and graphics to evaluate the quality of the adjustment and verify the model assumptions. SDesti greatly simplifies the process of data analysis and can be easily used by non-statisticians. The analytical package includes a complete manual that provides detailed information: on the data structure requirements, on the variable nomenclature rules and program operating procedures, on the data analysis (complemented with examples) and on the interpretation of the results (type ??SDesti on R console). SDesti eliminates redundancy, reduces human error and, coupled with a suitable sampling design, standard sampling and sample treatment procedures, it contributes to improve the consistency of the results in environmental studies. SDesti binary for windows users and installation instructions can be found below. Compiled for R 4.3.2 version. Refer to the program PDF manual for a detailed description of the data structures, functions, data analyses and interpretation of results. Type ??SDesti on R, or RStudio consoles and select the PDF file. Note: RStudio 2023.09.1 has a bug that delivers an error message when trying to open PDF vignettes (program manuals). Use R 4.3.2 console to open SDesti's PDF manual.
    • Dataset
  • Data for: Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species
    This compressed file contains the following data sets from an ensemble prediction with two different methods of selecting pseudo-absence data sets (SRE, 2 degree) and eight different methods of transforming numerical prediction into binary predictions. (1) Figure 2: Model accuracy for numerical prediction of random forests regression (RT) and classification (CT) algorithms. (2) Figure 3: Optimal threshold and model accuracy for binary predictions produced by eight threshold-selecting methods. (3) Figure 4: Spatial correspondence (as judged by the first axis of principal component analysis) among binary predictions produced by eight threshold approaches. (4) Figure 5: Spatial correspondence in binary predictions (as judged by McNemar tests) for pairwise among threshold approaches. (5) Table 1: Species range shifts predicted by classification (CT) and regression (RT) algorithms of random forests. (6) Table S1_Ecological requirements, biological characteristics and niche properties for the 52 tree species. (7) Table S2_Species range shifts estimated basing on numerical prediction of RT. (8) Species distribution maps for 52 forest trees (Raw data file, Species distribution maps). (9) Supplementary figures and tables. (10) R codes & R functions used in the study.
    • Dataset
  • Data for: Habitat-Net: Segmentation of habitat images using deep learning
    Training data and test data for Habitat-Net. Test data results for segmentation or canopy and understory images processed: (1) manually by two observers, (2) using a simple thresholding script in python, (3) using U-Net, and (4) using Habitat-Net.
    • Dataset
  • Data for: A template model to simulate the spread and management cost of invasive plant species at landscape scale
    Equations of the netlogo kudzu model template
    • Dataset
  • Data for: A template model to simulate the spread and management cost of invasive plant species at landscape scale
    raster map necessary to run the model
    • Dataset
  • Image data used in "Automated wildlife image classification: An active learning tool for ecological applications"
    Bavarian image data used in Bothmann et al. (2023) "Automated wildlife image classification: An active learning tool for ecological applications" (doi.org/10.1016/j.ecoinf.2023.102231), preprint at https://arxiv.org/abs/2303.15823
    • Dataset
  • Image data used in "Automated wildlife image classification: An active learning tool for ecological applications"
    Bavarian image data used in Bothmann et al. (2023) "Automated wildlife image classification: An active learning tool for ecological applications" (doi.org/10.1016/j.ecoinf.2023.102231), preprint at https://arxiv.org/abs/2303.15823
    • Dataset
  • sytbru/CV-clustered-data: First release
    R scripts related to the manuscript "Dealing with clustered samples for assessing map accuracy by cross-validation" in Ecological Informatics. DATA The input data are supposed to be in a directory "data". The data can be downloaded from Zenodo: DOI:10.5281/zenodo.6513429 agb.tif = above ground biomass (AGB) map AGBstack.tif = covariates used for predicting AGB aggArea.tif = coarse grid used for simulation in the model-based methods ocs.tif = soil organic carbon stock (OCS) map OCSstack.tif = covariates used for predicting OCS strata.xxx = geo-strata used (shp) for generating the clustered samples TOTmask.tif = mask of the area covered by the covariates RUNING THE SCRIPTS First, the samples need to be prepared by running the scripts sample_*.R Next, the cross-validation scripts named CV_*.R can be run. Start by running "CV_random.R", as the other CV_*.R scripts depend on the results it produces. The script "CV_model_based.R" should be run before running "CV_heteroscedastic.R". The script figs.R can be used for reproducing several of the figures shown in the manuscript. Here it is assumed that the full set of results has been generated (see WARNING below). WARNING Note that running the (single core) scripts with the full sample size and number of replications as used in the paper requires a very long time to complete. Set n_samp, n_CV and nsim to numbers << 100 to check the approach without reproducing all the results. The code can easily be adapted to run on multiple cores.
    • Software/Code
  • Data files belonging to the paper "Dealing with clustered samples for assessing map accuracy by cross-validation"
    Mapping of environmental variables often relies on map accuracy assessment through cross-validation with the data used for calibrating the underlying mapping model. When the data points are spatially clustered, conventional cross-validation leads to optimistically biased estimates of map accuracy. Several papers have promoted spatial cross-validation as a means to tackle this over-optimism. Many of these papers blame spatial autocorrelation as the cause of the bias and propagate the widespread misconception that spatial proximity of calibration points to validation points invalidates classical statistical validation of maps. In the paper related to these data, we present and evaluate alternative cross-validation approaches for assessing map accuracy from clustered sample data. The study area is western Europe, constrained in the north at 52° latitude and at -10° and 24° longitude The projection is IGNF:ETRS89LAEA (Lambert azimuthal equal area projection). Files: agb.tif = above ground biomass (AGB) map from version 3 of the 2017 CCI-Biomass product (https://catalogue.ceda.ac.uk/uuid/5f331c418e9f4935b8eb1b836f8a91b8) AGBstack.tif = covariates used for predicting AGB aggArea.tif = coarse grid used for simulation in the model-based methods ocs.tif = soil organic carbon stock (OCS) map (0-30 cm) from Soilgrids (https://www.isric.org/explore/soilgrids) OCSstack.tif = covariates used for predicting OCS strata.xxx = 100 compact geo-strata (ESRI shape) created with the spcosa package; used for generating clustered samples TOTmask.tif = mask of the area covered by the covariates Details and data sources of the covariates in AGBstack.tif and OCSstack.tif: Name Description Source Note ai Aridity Index https://chelsa-climate.org/downloads/ Version 2.1 bio1 Mean annual air temperature [°C] https://chelsa-climate.org/downloads/ Version 2.1 bio5 Mean daily maximum air temperature of the warmest month [°C] https://chelsa-climate.org/downloads/ Version 2.1 bio7 Annual range of air temperature [°C] https://chelsa-climate.org/downloads/ Version 2.1 bio12 Annual precipitation [kg/m2] https://chelsa-climate.org/downloads/ Version 2.1 bio15 Precipitation seasonality [kg/m2] https://chelsa-climate.org/downloads/ Version 2.1 gdd10 Growing degree days heat sum above 10°C https://chelsa-climate.org/downloads/ Version 2.1 clay Clay content [g/kg] of the 0-5cm layer https://soilgrids.org/ Only used for AGB sand Sand content [g/kg] of the 0-5cm layer https://soilgrids.org/ as above pH Acidity (Ph(water)) of the 0-5cm layer https://soilgrids.org/ as above glc2017 Landcover 2017 https://land.copernicus.eu/global/products/lc, reclassified to: closed forest, open forest, natural non-forest veg., bare & sparse veg. cropland, built-up, water Categorical variable dem Elevation https://www.eea.europa.eu/data-and-maps/data/copernicus-land-monitoring-service-eu-dem cosasp Cosine of slope aspect Computed with the terra package from elevation Computed @25m resolution; next aggregated to 0.5km sinasp Sine of slope aspect Computed with the terra package from elevation as above slope Slope Computed with the terra package from elevation as above TPI Topographic position index Computed with the terra package from elevation as above TRI Terrain ruggedness index Computed with the terra package from elevation as above TWI Topographic wetness index Computed with SAGA from 500m resolution (aggregated) dem gedi Forest height https://glad.umd.edu/dataset/gedi Zone: NAFR xcoord X coordinate Using a mask created from the other covariates ycoord Y coordinate Using a mask created from the other covariates Dcoast Distance from coast Using a land mask created from the other covariates
    • Dataset
  • Rocky Shore Samples: Bayesian Networks as a novel tool to enhance interpretability and predictive power of ecological models.
    Our study site was two continuous sections of rocky shore of 10m in length at East Sands, St Andrews, Scotland (56°20'04.0"N 2°46'23.2"W). Both sites were at a tidal height of 2.9 m above chart datum. These sites were selected based on initial inspection of community structure to ensure that both macroalgae and barnacle stands were present, but were otherwise haphazardly selected from other potential sites at this height on the shore. All sampling occurred in May 2020. Sampling only occurred during low tide time periods. Fifty haphazardly placed 50x50cm double strung quadrats were placed at each site. Grazer count (for Littorina littorea (littorinids) and Patella vulgata (limpets) were obtained for each quadrat. Percentage cover estimates for barnacles (Semibalanus balanoides and Chthamalus stellatus), macroalgae (Ascophyllum nodosum and Fucus vesiculosus) and microalgae (Biofilm) were obtained by photographing the quadrat frames using an iPhone XR. Other grazer species were extremely rare, and accounted for < 2% of grazers found.
    • Dataset
1