Published: 12 June 2024| Version 1 | DOI: 10.17632/hfyth5t3gg.1
Tohidur Rahaman, Tanmay Sarkar


The dataset comprises over 500 images of tomatoes (Solanum lycopersicum), categorized into two classes: "good" and "bad." These images were captured using a Redmi 9 Power mobile camera against a black background under daylight conditions. **Data Description:** 1. **Classes:** - Good: Represents healthy tomatoes exhibiting desirable characteristics such as uniform color, shape, and absence of blemishes, bruises, or signs of disease. - Bad: Encompasses tomatoes displaying signs of damage, disease, or other undesirable traits such as discoloration, rot, deformities, or pest infestation. 2. **Image Collection:** - The dataset consists of over 500 images, with a substantial number depicting both good and bad instances of tomatoes. - Images were captured under consistent daylight conditions to ensure uniformity and minimize environmental variability. - A black background was employed to enhance tomato visibility and isolate the subject. 3. **Data Source:** - Images were captured using a Redmi 9 Power mobile camera, ensuring consistent image quality and resolution across the dataset. - Daylight conditions were chosen to provide natural lighting, reducing artificial effects on tomato appearance. 4. **Annotation:** - Each image is labeled according to its class (good or bad), facilitating supervised learning tasks. - Annotations may include bounding boxes or masks outlining the tomato area to aid in localization tasks. 5. **Data Preprocessing:** - Preprocessing techniques such as resizing, normalization, and background removal may have been applied to the images to improve model performance and reduce computational complexity. - Metadata such as image resolution, format, and capture settings may accompany the dataset for reference. 6. **Data Distribution:** - The dataset maintains a balanced distribution between good and bad tomatoes, ensuring equal representation of both classes. - Randomization techniques may have been utilized during data collection and organization to prevent biases in model training. 7. **Potential Applications:** - The dataset can be used for various machine learning tasks, including classification, object detection, and image segmentation, particularly in agricultural applications. - Applications may include automated sorting systems for tomato quality control, disease detection, and yield optimization. 8. **Limitations:** - Despite efforts to ensure data consistency and quality, variations in lighting conditions, camera angles, and tomato orientation may introduce some degree of variability. - The dataset primarily focuses on tomatoes of Solanum lycopersicum and may not generalize well to other tomato varieties or environmental conditions.



Biological Classification, Characterization of Food