MHCD_GIETV1
Description
Creation of Handwritten Marathi Simple Character Dataset by Data Collection, Annotation, Bounding Box, Threshold, Augmented and creating images of equal size for research purpose Marathi Simple character (bara_kadhi) dataset consists of Annotate data that consists of 24, 040 simple character examples of image data. The original scanned images from Handwritten Character were re-sized and normalized to fit in a 128x128 pixel box while preserving their aspect ratio. Four folders are available: • Original Images consists of 6,010 • Grayscale Images consists of 6,010 • Binary Images consists of 6,010 • Inverted Images consists of 6,010
Files
Steps to reproduce
The data was collected from different ages group of people including student from Primary, High School and Colleges where the students were given a A4 size paper to write the simple bara_kadhi in their own hand writing so that we can predict the handwritten characters of the people. Then the images where annotated and classes were defined using online tools of annotation. The classes and their respective images where then used to extract data using matlab to get annotate images into different folders. Pre-processing techniques were then implemented to resize and have exact pixel size of 128 by 128 pixel for further evaluation. At last, depending on the size of the data, more augmented images were created keeping in mind the aspect ratio for prediction and training our convolution neural network. Finally, we went ahead with binarization, Gray scaling and inverted images technique for future research work. This dataset will help the research scholars and data scientist who want to deep dive in Marathi Character Recognition.