A Comprehensive Spatio-Temporal Dataset for Agricultural Price Forecasting: Integrated Market, Weather, and Event Features (2014–2024)

Published: 4 December 2025| Version 1 | DOI: 10.17632/ds9jmxp9zy.1
Contributors:
Ravindra Shelar, Amit Hajare, Siddesh Mangarule

Description

This dataset presents a comprehensive, all-India spatio-temporal compilation of onion market information integrated with high-resolution climate and event-based features covering the period 2014–2024. The dataset was fully assembled from publicly available government sources, including Agmarknet for daily market arrivals and prices, and the India Meteorological Department (IMD) for gridded rainfall and temperature data. Each record represents a unique combination of state, district, market, variety, and date, enabling fine-grained agricultural market analysis across India. To capture weather–market interactions, the dataset includes daily rainfall, Tmax, Tmin, derived and rolling climatic features, and long-window climatic signals relevant to onion growth cycles. Additional event-level variables such as festivals, elections, and supply-disruption periods are included to model non-seasonal volatility. All features were carefully engineered with strict anti–data-leakage protocols, ensuring suitability for machine learning, deep learning, and econometric forecasting tasks. The resulting dataset is designed for research in time-series forecasting, price volatility analysis, supply chain modeling, climate–market interactions, and agricultural risk assessment. It serves as a clean, ready-to-use resource for academic, industrial, and policy-oriented studies on Indian agricultural markets.

Files

Steps to reproduce

This dataset was created through a multi-stage data assembly and preprocessing pipeline that combined official agricultural and meteorological data sources across India from 2014 to 2024. Daily market-level onion price and arrivals data were obtained from the Directorate of Marketing and Inspection’s Agmarknet portal, which provides state–district–market–variety level information. Weather variables including daily rainfall, maximum temperature (Tmax), and minimum temperature (Tmin) were sourced using the official India Meteorological Department (IMD) gridded datasets, accessed through the IMDLIB Python package. Each market was matched to its nearest IMD grid cell using latitude–longitude coordinates. A standardized Python-based workflow was used to clean, merge, and engineer features. This included handling missing values, converting the raw data into a continuous daily time series, enforcing anti–data-leakage rules, and creating rolling, lagged, and long-window climate features aligned with onion growth stages. Event-based features (festivals, elections, and supply-disruption periods) were integrated from curated calendars to capture market anomalies. All processing steps were implemented in Python 3.10 using Pandas, NumPy, IMDLIB, Scikit-learn, and custom scripts. The workflow ensures complete reproducibility: given the same raw Agmarknet and IMD inputs, the code produces an identical cleaned and integrated dataset suitable for forecasting and analytical tasks.

Categories

Agricultural Science, Agricultural Economics, Data Science, Machine Learning, India, Weather, Time Series Modeling, Onion, Agriculture

Licence