A Multi-Parameter Dataset for Machine Learning Based Fruit Spoilage Prediction in an IoT-Enabled Cold Storage System
Description
This dataset was compiled as part of a project to design a cold storage system to combat post-harvest food loss in developing regions by integrating IoT technology with predictive machine learning. The project, which was developed for use by smallholder farmers in Uganda, aims to monitor and proactively control the environmental conditions in cold storage units to extend the shelf life of perishable goods. This dataset is specifically structured for machine learning applications, serving as the training and validation data for machine learning models. It contains environmental data points collected in a controlled cold storage environment. The data is organized into a comma-separated value (CSV) file with a total of 10996 entries in the following six columns: Fruit: A categorical variable indicating the type of fruit being stored (e.g., Orange, Pineapple, Banana, Tomato). Temp: The temperature inside the cold storage unit, measured in degrees Celsius (°C). Humid: The relative humidity (RH) of the environment, measured as a percentage (%). Light: The intensity of light exposure, measured in Lux. CO2: The concentration of carbon dioxide (CO₂) in the air, measured in parts per million (ppm). Class: A binary classification label (Good or Bad) that serves as the target variable for the predictive model, indicating whether the environmental conditions are optimal or suboptimal for spoilage prevention. The data's primary purpose is to provide a basis for training predictive models to classify environmental conditions and assess spoilage risk. The dataset is a valuable resource for researchers and practitioners in fields such as smart agriculture, food science, embedded systems, and machine learning. It can be used to: Train, validate, and test new predictive models for food spoilage. Analyze the correlation between specific environmental factors (temperature, humidity, CO2, and light) and fruit spoilage outcomes. Support the development of low-cost, intelligent monitoring systems for cold chain logistics and food preservation. This dataset and the associated project are intended to contribute to achieving the United Nations Sustainable Development Goals (SDGs), particularly those related to food security and sustainable agriculture.