Multilingual Character Recognition Dataset for Moroccan Official Documents
Description
the printed dataset with standard fonts of the characters that are used in Moroccan official documents are not available in internet with open-source license, specially Tifinagh and Arabic languages, which made us build new raw dataset, where we collected the most used fonts, then based on them we built 6 datasets; which is: Alphabet (contains the alphabet for a to z in lowercase and uppercase), digits (contains the numbers from 0 to 9), Arabic (contains the whole letters), Tifinagh (contains the all Tifinagh letters), French special characters such as “à, é, ç, è…” (contains the all special characters of French language), Symbols such as “?, !, (, )…”, in order to make a data augmentation, we generate more than one character with the same font.