Data for Differentiation of nephelium lappaceum producers in Honduras: perspectives for the inclusive rural bioeconomy

Published: 17 November 2025| Version 1 | DOI: 10.17632/skss2s6gry.1
Contributors:
,
,
,
,

Description

The Data methodology applied multivariate statistical techniques to primary data collected from 314 producers. Principal Component Analysis (PCA) was used to reduce dimensionality, followed by K-means clustering to define the typologies. Kruskal-Wallis tests were applied to validate the significant socioeconomic and organizational differences among the identified clusters.

Files

Steps to reproduce

Data Collection and Analytical Sample Justification 1. Sampling Design and Initial Calculation (n=314) The study focused on the Department of Atlántida, Honduras. The target population (sampling frame) was established at 1,696 registered rambutan producers based on the latest available regional agricultural census data. A sample size formula for a finite population was applied, using a 95% confidence level and a maximum tolerable error of 5%}. This statistical calculation determined an initial required sample size of 314 producers. A Simple Random Sampling approach was employed to select the 314 participants from the official producer lists, ensuring the statistical representativeness of the department. 2. Instrument, Execution, and Quality Control Data were collected through a structured survey administered by trained field personnel between March and November 2024. The instrument was designed to capture socioeconomic, productive, organizational, and technological variables relevant for the subsequent typification. Digital platforms (e.g., CommCare or similar) were utilized for data entry, allowing for real-time georeferencing and immediate quality checks during fieldwork. Ethical approval was obtained from [Insert Name of Ethics Committee], and informed consent was secured from all participants prior to commencing the interviews, ensuring anonymity and voluntary participation.3. Data Cleaning and Final Analytical Sample (N=314)Following the completion of the 314 representative surveys, the dataset underwent a rigorous cleaning and validation process prior to multivariate analysis. This step ensured data quality and consistency for the statistical model. Quality Control and Data Consistency: Records were systematically validated for integrity and completeness across all critical variables used in the Principal Component Analysis (PCA)—such as annual income, total production area, and organizational participation. Outlier Treatment: Given the marked asymmetry confirmed in the descriptive analysis (Table 2), extreme outliers were handled through robust statistical methods (e.g., non-parametric transformation or Winsorizing) to maintain the full representative sample size.Final Analytical Sample: After this stringent quality control process, the sample used for the multivariate statistical analyses was finalized at 314 producers (n=314). The full statistically representative sample was maintained as the effective analytical sample size, which was considered robust and adequate for the intended PCA and K-means clustering procedures.

Institutions

  • Universidad Nacional Autonoma de Honduras
  • Universidad Nacional Autonoma de Nicaragua Leon

Categories

Culture in Rural Development, Bivariate Analysis, Heterogeneity Characterisation, Agricultural Diversification, Circular Bioeconomy

Licence