CDS-Cricket: Image dataset of domestic crickets (Acheta domesticus) for counting and classification by developmental stage
Description
The CDS-Cricket dataset is a scientific information resource. It provides images of the domestic cricket at its three developmental stages: eggs, nymphs, and adults. The images were captured in a controlled environment. This makes the dataset useful for research into deep learning models and computer vision. They are ideal for insect detection and classification tasks.
Files
Steps to reproduce
The CDS-Cricket dataset was collected at the Food Industry Engineering Chemistry Laboratory of the Higher Technological Institute of Teziutlán, Puebla. The data were generated using a prototype three-shelf multi-level incubator, specifically designed to segregate the developmental stages of the domestic cricket (Acheta domesticus): the lower level for eggs, the middle level for nymphs, and the upper level for adults. This configuration accommodates the insect’s biological behavior, minimizing the impact of its jumping activity in advanced stages. The system independently controls temperature and humidity, using red lighting to mitigate stress on the specimens during growth. For image capture, an embedded system was implemented consisting of the following elements: A Raspberry Pi 3 Model B+ board integrated with a 5 MP OV5647 camera module. The camera was mounted horizontally at the top of each level using a metal bracket capable of moving along the X-axis, which allowed for a wider field of view and improved perspective of the specimens. A 90° capture angle was empirically established to maximize visual quality and sharpness of the individuals. A grid pattern was integrated into the background of the containers as a scale reference to enable the estimation of the millimeter-scale size of nymphs and adults. During the image acquisition phase, the ambient red light was temporarily replaced with white light to minimize shadows and optimize the color consistency required for computer vision models. Systematic image acquisition was performed using specific scripts that adjusted sensor properties according to the developmental stage to ensure high-quality data. The configured technical parameters were: Eggs: Shutter (--shutter) 45000, gain (--gain) 3, quality 100, and lens position (--lens-position) 0.6. Nymphs and Adults: Shutter (--shutter) 15000, keeping gain (3), quality (100), and lens position (0.6) constant. Output Format: Images were stored in JPEG format with a resolution of 4056 x 3040 pixels. The workflow for consolidating the dataset followed a rigorous three-phase sequence: Sample Preselection: The images were manually filtered to exclude those with low informational quality, resulting in a final collection of 905 RGB digital images classified into five taxonomic categories: eggs, nymphs, adults, and binary combinations (adults with nymphs and eggs with nymphs) to enhance the robustness of the analysis in complex scenarios. Labeling: The data were meticulously annotated using LabelStudio software. Data Set Split: An 80:20 split was implemented, allocating 703 samples for training and 204 samples for validation. This standardized protocol ensures the reproducibility of the research and provides a solid foundation for training deep learning architectures such as YOLOv8 in its nano, small, and medium versions.
Institutions
- Tecnológico Nacional de MéxicoMexico City, Mexico City