halfFace: A face covering mask dataset of South Asian people

Published: 28 July 2022| Version 1 | DOI: 10.17632/pk44mkx9vm.1
Contributors:
,
,
,
M Shamim Kaiser,
Mufti Mahmud

Description

halfFace is a dataset that contains facial images of both masked and unmasked images of South Asian people (166 participants). This is a combination of 3 different datasets which include full face, upper face, and augmented upper face images so that it can be used for person identification from both masked and unmasked images and detecting unmasked person. The participants were fully informed about the usage of the dataset and an approval was taken from them for further usage of this data while collecting data from them. Method Images of the dataset were captured in participant’s environment, and using participant’s smart phone camera. Hence, this dataset contains the diversity of devices. This images were then collected by the authors. Then, three different datasets were developed from the images. In the dataset-1, only facial part was kept using a pretrained CNN model, and it was not divided into train-test images. The facial images of masked and unmasked images can be easily identified by naming convention. In dataset-2, we have excluded the covered face portion by using YOLOV3. So, images of dataset-2 contains only upper part of the face (hair, forehead, eyes, and upper portion of nose), and it was further divided into training-testing parts. As models require a huge amount of data to train, we have increased the number of training images using augmentation method by adjusting brightness, blur, contrast, saturation, gaussian noise, and salt and pepper noise. This has made the dataset-3 even more diverse and noisy which is essential in building robust models. Usage Notes Dataset-1 : Full face images Datset-2: Upper face without augmentation (divided into train-test) Dataset-3: Upper face with augmentation of training part, testing images were same as dataset-2 Naming convention: Folders: P_XXX where XXX is the identification of participants. Images of dataset-1 and dataset-2: pXXX_(0/1)(0/1/2)(m/f) Here, pXXX is the person identification number. First (0/1) means unmasked (0) or masked(1). (0/1/2) means the image is front facing(0)/ left facing (1), right facing (2). (m/f) denotes the gender of the participant. Images of dataset-3: pXXX_(0/1)(0/1/2)(m/f)_(b/blur/c/GN/s/SPN) Here, pXXX is the person identification number. First (0/1) means unmasked (0) or masked(1). (0/1/2) means the image is front facing(0)/ left facing (1), right facing (2). (m/f) denotes the gender of the participant. “b/blur/c/GN/s/SPN” means change of brightness (b), blur (blur), contrast (c), gaussian noise (GN), saturation (s), Salt and Pepper noise (SPN).

Files

Categories

Computer Vision, Recognition

License