A Microbiological Image Repository of Escherichia coli and Klebsiella pneumoniae Bacterial Colonies on MacConkey Agar

Published: 3 March 2026| Version 1 | DOI: 10.17632/kx6gz3wmcf.1
Contributors:
,
,
,
,
,
,
,
,
,

Description

This dataset consists of images of two types of bacterial strains streaked on MacConkey agar plates. The images of the bacterial colonies were taken under two different shooting conditions, “controlled” and “uncontrolled” as described in “steps to reproduce” section. These bacterial strains are: 1- Escherichia coli (E. coli): the number of images under controlled conditions is (168) and the number of images under uncontrolled conditions is (3532). 2- Klebsiella pneumoniae (K. pneumoniae): the number of images under controlled conditions is (152) and the number of images under uncontrolled conditions is (3513). IN THE REPOSITORY, YOU WILL FIND: Group 1 consists of images taken from 39 and 36 plates of K. pneumoniae and E. coli respectively, under controlled and uncontrolled conditions. Group2 consists of images taken from 25 plates of K. pneumoniae and E. coli, under controlled and uncontrolled conditions. An excel sheet detailing the numbers of images in the folders. NOTABLE FINDING: Baseline CNNs trained on this data achieved high accuracy, indicating that phone images provide sufficient discriminative signal without expert inspection. Please refer to: 1) S. A. Nagro et al., "Automatic Identification of Single Bacterial Colonies Using Deep and Transfer Learning," in IEEE Access, vol. 10, pp. 120181-120190, 2022, DOI: https://doi.org/10.1109/ACCESS.2022.3221958 2) M. Kutbi et al., "Leveraging Smartphone Imaging and Deep Transfer Learning for Bacterial Colony Classification: From Uncontrolled to Controlled Settings," in IEEE Access, doi: https://doi.org/10.1109/ACCESS.2025.3625648 HOW THIS DATASET CAN BE USED: This dataset can be utilized in any research interested in recognizing different features of bacterial colonies. The dataset can also be used to train and evaluate deep learning models for colony classification, while also supporting studies on robustness, generalization, and practical deployment, to advance computer vision and AI applications in microbiology. By combining clinically important bacteria with controlled and uncontrolled imaging, the dataset offers a realistic and accessible resource for researchers interested in developing AI methods that perform reliably in laboratory and non-laboratory environments. IMPORTANT NOTE: This dataset was expanded to include different types of bacterial strains and culture media which can be found in [DOI: 10.17632/v54x8jdx5x.1] IF YOU USE THIS DATASET, PLEASE REFERENCE THE FOLLOWING: 1. DOI: https://doi.org/10.1109/ACCESS.2022.3221958 2. DOI: https://doi.org/10.1109/ACCESS.2025.3625648 3. DOI: https://doi.org/10.17632/kx6gz3wmcf.1 4. DOI: https://doi.org/10.17632/v54x8jdx5x.1

Files

Steps to reproduce

The dataset collected in this study consists of 7,365 digital images of colonies of two different bacterial strains: E. coli and K. pneumoniae. These digital images were collected in the Institute of Health Technologies and Preventive Medicine–King Abdulaziz City for Science and Technology (KACST) in the Kingdom of Saudi Arabia. The bacterial strains were initially streaked on MacConkey agar plates and incubated at 37◦C for 24 hours. A single colony was taken with a 10 µl loop and restreaked on a fresh MacConkey agar plate for increased purity. These fresh plates were incubated under the same conditions as the other plates in the dataset. Three different mobile phones were used to collect these photos (iPhone Xs Max, iPhone 11 Pro and iPhone 7). All phone cameras were set at 1080p resolution to obtain higher quality photos. The horizontal and vertical resolution of all images was 72 dots per inch (dpi). Different shooting settings were also used in the collecting process in ‘‘controlled’’ and ‘‘uncontrolled’’ environments to obtain a wider range of image qualities, camera poses, lighting conditions, etc. In the controlled environment, the distance between the camera and plate, camera pose, lighting conditions, and camera itself were fixed during image collection. In the uncontrolled environment, these conditions were not fixed.

Institutions

Categories

Computer Vision, Mobile Computing, Machine Learning, Laboratory Diagnosis, Clinical Bacteriology, Deep Learning, Explainable Artificial Intelligence

Licence