Data mining models to predict timber production across Colombian departments. - Dataset

Published: 9 May 2026| Version 1 | DOI: 10.17632/7gjg9s77yp.1
Contributors:
,
,
,

Description

Descripción: This repository contains the data and code supporting the research article "Data mining models to predict timber production across Colombian departments", developed as a master's thesis at Universidad Cooperativa de Colombia. The study applies machine learning and time series techniques to forecast quarterly timber mobilization volumes at the departmental level in Colombia, following the CRISP-DM methodology in R/RStudio 4.4.1. File 1 – Base_de_datos_relacionada_con_madera_movilizada_proveniente_de_Plantaciones_Forestales_Comerciales.xlsx: Raw open-access database published by the Colombian Agricultural Institute (ICA), retrieved from the Colombian Open Data Portal (datos.gov.co). Contains 53,856 records across 9 variables: year, semester, quarter, department, municipality, timber species, product type, data source, and mobilized volume (m³). Covers 28 departments, 699 municipalities, and 144 timber species between 2012 and 2022. Provided without modifications, preserving the original structure as downloaded from the official source. File 2 – tesis.R: Fully commented R script (795 lines) implementing the complete analytical pipeline: data cleaning and preprocessing, spatio-temporal analysis with annual choropleth maps (GADM cartography), missing value imputation using KSSA with automatic best-fit selection (departments with >30% missing values excluded), and predictive modeling fitting five algorithms per department (ARIMA, Prophet, GLMNET, Random Forest, Prophet Boost) with a 90/10 temporal split. Models are evaluated using RMSE, MAE, MAPE, SMAPE, and MASE, and four-quarter-ahead forecasts are generated for each department. Main packages: tidymodels, modeltime, kssa, ggplot2, sf, ranger. Keywords: Timber production; Colombia; Data mining; Machine learning; ARIMA; Random Forest; CRISP-DM; Time series; Open data; R.

Files

Categories

Agricultural Science, Computer Science, Environmental Science, Biological Science Tools

Licence