Multilingual Character Recognition Dataset for Moroccan Official Documents

Published: 1 November 2023| Version 2 | DOI: 10.17632/xp3hrmywfm.2
Contributors:
Ali Benaissa, Abdelkhalek Bahri, Ahmad El Allaoui

Description

the printed dataset with standard fonts of the characters that are used in Moroccan official documents are not available in internet with open-source license, specially Tifinagh and Arabic languages, which made us build new raw dataset, where we collected the most used fonts, then based on them we built 6 datasets; which is: Alphabet (contains the alphabet for a to z in lowercase and uppercase), digits (contains the numbers from 0 to 9), Arabic (contains the whole letters), Tifinagh (contains the all Tifinagh letters), French special characters such as “à, é, ç, è…” (contains the all special characters of French language), Symbols such as “?, !, (, )…”, in order to make a data augmentation, we generate more than one character with the same font.

Files

Institutions

Universite Abdelmalek Essaadi Faculte des Sciences et Techniques de Tanger, Universite Abdelmalek Essaadi

Categories

Computer Science, Optical Character Recognition, Machine Learning, Deep Learning

Licence