Multimodal_Cactaceae_Dataset_25
Description
This data article presents a multimodal, non-invasive dataset documenting the physiology, growth, and phenological stages of Stenocereus queretaroensis (commonly known as pitayo), a columnar cactus of high agronomic, nutritional, and cultural importance in the arid and semi-arid regions of southern Zacatecas, Mexico. Although this species has traditionally been harvested from wild populations, initiatives for its controlled cultivation are emerging as part of regional efforts to improve fruit quality, extend shelf life, and ensure consistent production. However, controlled cultivation remains relatively rare and understudied, and there is a lack of systematic, publicly available data describing its developmental cycle and physiological responses under managed conditions. To address this gap, we designed and executed a comprehensive data collection campaign between January and May 2025, spanning multiple phenological stages from vegetative growth through flowering and fruiting. A total of 109 healthy, fruit-bearing plants were selected according to inclusion criteria that prioritized structural integrity, absence of disease symptoms, and accessibility for imaging and in-field measurements. Data acquisition combined high-resolution multispectral drone imaging with ground-based spectrophotometry in the 400–700 nm range, enabling the capture of both canopy-level spatial information and precise point-based reflectance signatures. Imaging was performed using a DJI Mavic 3M equipped with RGB and multispectral sensors. For each plant, one nadir RGB photograph and a corresponding set of multispectral captures were obtained. These data were processed to generate three complementary products: (i) original RGB images and baseline NDVI maps provided by the manufacturer’s software, (ii) spatially adjusted RGB frames cropped, scaled, and registered to NDVI outputs to ensure pixel-level alignment and allow region-of-interest (ROI) analysis, and (iii) recalculated NDVI images derived directly from NIR and Red bands to produce independent vegetation index maps. Spectral measurements were acquired with a YS45 field spectrophotometer using a standardized white reference for reflectance calibration. All spectral data were exported in .csv/.txt format and subsequently processed in MATLAB, where reflectance curves were stored as .mat files and plotted as PNG graphs for visual inspection. Metadata for each sample include phenological stage, acquisition date and time, geographic coordinates, temperature, illumination conditions, and sensor configuration parameters. Together, these components form a hierarchically organized dataset that facilitates temporal tracking of individual plants and robust integration of image-derived features with field-collected spectra. Preprocessing steps included reflectance normalization, geometric registration of images, and standardization of file naming conventions, resulting in structured dataset ready for computational analysis.
Files
Steps to reproduce
The data were obtained through systematic field sampling of Stenocereus queretaroensis plants in commercial orchards and experimental plots between January and May. A total of 109 plants were selected based on uniform growth stage and health status, then labeled for traceability. For each plant, one original RGB image and corresponding multispectral images (GREEN 560 ± 16 nm, RED 650 ± 16 nm, Red Edge 730 ± 16 nm, and Near-Infrared 860 ± 16 nm) were captured using a calibrated multispectral camera mounted on a tripod under consistent lighting conditions. All image acquisition followed the same protocol regarding time of day and sensor positioning to ensure reproducibility. Raw images were stored in a structured database, tagged with metadata including date, time, plant ID, and geographic coordinates. Data preprocessing, such as image adjusted and NDVI calculation, were performed using Matlab Release 2024a. Additionally, spectral signatures in the visible range (VIS) were extracted from each multispectral image by selecting regions of interest (ROI) on the plant surface and computing mean reflectance values for each spectral band, generating a representative spectral profile for each plant. All image acquisitions followed a standardized protocol regarding time of day and sensor positioning to ensure reproducibility. Quality control included verifying image sharpness, spectral calibration, and metadata completeness. The final dataset, along with preprocessing scripts, has been documented and version-controlled to enable reproducibility.
Institutions
- Universidad Autonoma de Zacatecas