Augmented COVID-19 CT images with CUTMIX, CUTOUT, and MIXUP Data Augmentation
Description
This dataset consists of 30,000 augmented COVID-19 CT images generated from CUTMIX (Yun et al., 2019), CUTOUT (DeVries & Taylor, 2017), and MIXUP (Zhang et al., 2018) data augmentation techniques. The COVID-19 CT images are acquired from multiple publicly available sources (Morozov et al., 2020; Rahimzadeh et al., 2021; Maftouni et al., 2021). References DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 Maftouni, M., Law, A.C, Shen, B., Zhou, Y., Yazdi, N., & Kong, Z.J. (2021). A Robust Ensemble-Deep Learning Model for COVID-19 Diagnosis based on an Integrated CT Scan Images Database. In Proceedings of the 2021 Industrial and Systems Engineering Conference (pp. 632-637). Institute of Industrial and Systems Engineers. Morozov, P., Andreychenko, A. E., et al. (2020). MosMedData: Chest CT Scans With COVID19 Related Findings Dataset. arXiv preprint arXiv: 2005.06465. Rahimzadeh, M., Attar, A., & Sakhaei S. (2021). A fully automated deep learning-based network for detecting COVID-19 from a new and large lung CT scan dataset. Biomedical Signal Processing and Control, 68, pp. 102588. Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. arXiv preprint arXiv:1905.04899 Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412
Files
Steps to reproduce
1. CUTOUT (DeVries & Taylor, 2017): a 112 × 112 zero-mask is used to cut out a random patch of the images. Similar to the original implementation, the cut-out mask can be laid outside the border of the image. The shape of the rectangular cut-out region is randomly assigned in each DA process. 2. MIXUP (Zhang et al., 2018): the COVID-19 and non-COVID CT images are linearly interpolated based on a combination ratio λ ∈ [0, 1], sampled from a beta distribution. Similarly, the associated labels of the images are interpolated in each transformation process using the assigned combination ratios. 3. CUTMIX (Yun et al., 2019): a random patch of the image from domain A is removed and replaced with an image patch of a similar shape from domain A. The size of the image patch is determined by a combination ratio λ sampled from a beta distribution λ. The same λ is used to create new target labels for newly transformed images.