ChemEq25 :- A Novel Multiclass Dataset for Detecting Chemical Laboratory Apparatus Using Image Processing and Deep Learning Approach

Published: 3 March 2025| Version 1 | DOI: 10.17632/zptphkynt6.1
Contributors:
,
,
,
,
,

Description

The "ChemEq25" dataset comprises 4,599 annotated images of chemical laboratory equipment, collected from three locations: Bangladesh Atomic Energy Chemical Lab, Biochemical Laboratory of United International University, and Chemistry Laboratory of Dhaka Imperial College. Designed for machine learning applications, it supports real-time equipment detection. Images were captured under varying lighting and backgrounds to enhance model robustness. Each item was photographed from multiple angles, auto-oriented, and resized to 640x640 pixels, introducing slight stretching. The dataset includes 25 classes of common lab equipment, divided into training (70%), validation (20%), and test (10%) sets.

Files

Steps to reproduce

The "ChemEq25" dataset is a curated collection of 4,599 annotated images of chemical laboratory equipment, designed to advance machine learning applications in real-time equipment detection. Collected from three key laboratories in Dhaka, Bangladesh, the dataset captures diverse imaging conditions, angles, and backgrounds to enhance model robustness. Preprocessed for consistency, it includes standardized images and detailed annotations, organized into training, validation, and testing sets. This resource is tailored to support the development of accurate and reliable models for chemical laboratory equipment recognition. Data Collection: Collect data through fieldwork in controlled laboratory environments. Focus on three key locations: the Bangladesh Atomic Energy Chemical Laboratory at the University of Dhaka, the UIU Bio-Chemical Laboratory at United International University, and Dhaka Imperial College, all in Dhaka, Bangladesh. Use the primary cameras of four different smartphones to capture a diverse range of imaging qualities and perspectives. This ensures the dataset reflects real-world variability in lighting, angles, and backgrounds. Data Preprocessing: Apply preprocessing steps to enhance consistency and model accuracy. First, use the "Auto-Orient" feature to correct image rotation issues caused by EXIF data. Then, resize all images to 640×640 pixels to standardize dimensions and improve processing efficiency. These steps ensure uniformity and better model performance across diverse scenarios. *Dataset Organization:* Split the dataset into 70% training, 20% validation, and 10% testing. Organize the data into three primary directories: Train, Valid, and Test. Each directory should contain two subdirectories: - Image: Store the actual images of chemical laboratory equipment. Ensure each image is uniquely named to prevent conflicts and enable traceability. - Label: Store annotations in text files corresponding to each image. Include class labels and bounding box coordinates for object detection tasks. This structure ensures clarity, accessibility, and efficient model training and evaluation.

Institutions

United International University

Categories

Computer Vision, Image Processing, Research Equipment, Pattern Recognition, Laboratory

Licence