Fish Maw
Description
A fish maw authenticity identification dataset containing 5 categories was constructed with 2,494 original images: Annan maw (AN), white croaker fish maw (BM), Douhu maw (DH), Yali maw (YL), and Zhanjiang red-mouth fish maw (ZJCZ). Through data augmentation strategies including rotation, brightness adjustment, and Gaussian blur, the dataset was expanded to 4,243 images and divided into training, validation, and test sets in a 7:2:1 ratio.
Files
Steps to reproduce
The data collection methodology for this fish maw authenticity identification research was systematically designed to ensure comprehensive coverage and high-quality samples. Five morphologically similar species were identified as research targets: Zhanjiang red-mouth fish maw (ZJCZ), Annan maw (AN), white croaker fish maw (BM), Douhu maw (DH), and Yali maw (YL). A total of 2,494 high-quality original images were collected through dual acquisition channels, including field photography conducted at Shanghao Jiao retail establishment in Shantou City and Jiexun Aquaculture Co., Ltd. in Raoping County, supplemented by online resource retrieval. All images were captured using a Canon EOS 6D full-frame digital SLR camera at 4032×3024 pixel resolution to ensure consistent image quality. Subsequently, a systematic data preprocessing workflow was implemented. This encompassed rigorous quality assessment and screening to remove blurred or improperly exposed samples, uniform resizing to 640×640 pixels for YOLO model compatibility, and precise bounding box annotation using the Roboflow platform. The annotated dataset was then randomly divided into training, validation, and test sets in a 7:2:1 ratio to ensure scientific rigor and evaluation reliability. Finally, multi-parameter data augmentation was specifically applied to the training set to address class imbalance and enhance model robustness. The augmentation techniques incorporated ±15° random rotation, -25% to +25% brightness adjustment, and Gaussian blur with up to 1.5-pixel radius, effectively expanding the training samples from 1,749 to 3,498 images while maintaining the integrity of validation and test sets.