NexTech14_Waste_Dataset: A Benchmark Image Collection for Intelligent Waste Classification and Recycling Systems
Description
The NexTech14_Waste_Dataset is a high-quality, expertly curated image dataset designed to accelerate research in AI-driven waste classification, smart recycling, and sustainable technology. A comprehensive total of 11 distinct waste categories were identified: battery, e-waste, food-waste, garden-waste, other-organic-waste, glass, metal, paper, plastic, textile-fabric, and general trash. Each image within the dataset was meticulously labeled in accordance with its visual characteristics and material composition. The labeling methodology was formulated to facilitate the classification of three principal waste categories: Hazardous: Battery, E-waste Organic: Food waste, Garden waste, Other-organic-waste Recyclable: Glass, Metal, Paper, Plastic, Textile-fabric, Trash Despite the dataset encompassing 11 specific labels, all instances were ultimately categorized under one of the three principal waste classifications (Hazardous, Organic, and Recyclable). It comprises over 8,996 RGB images. Each image has been manually verified, preprocessed, and standardized to 224×224 pixels, include extensions (.jpg, .png) format and ensuring balanced representation and visual consistency. The dataset captures real-world variations in lighting, texture, and background to enhance robustness in model training and evaluation. After augmentation, the dataset was partitioned using a 70:15:15 ratio for training(train), validation(val), and testing(test). The splitfolders library was used with a fixed random seed (seed = 42) to ensure reproducibility and unbiased evaluation. Optimized for Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and hybrid deep learning frameworks, this dataset enables reproducible experimentation for automated waste recognition, robotics-based sorting, and environmental monitoring. Structured with clear folder hierarchies and annotated metadata, NexTech14_Waste_Dataset serves as a reliable benchmark for academic research and industrial innovation. Released under the CC BY-NC 4.0, it encourages open collaboration toward building intelligent, eco-sustainable systems for a cleaner future.
Files
Steps to reproduce
All preprocessing, labeling, and dataset organization have been completed. The dataset is ready-to-use for training, validation, and testing of deep learning models. Users can directly load the data into frameworks like TensorFlow, Keras, or PyTorch without additional setup.
Institutions
- Khwaja Yunus Ali University