Multilingual Character Recognition Dataset for Moroccan Official Documents

Name: Multilingual Character Recognition Dataset for Moroccan Official Documents
Creator: Ali Benaissa
Published: 2023-11-01T17:27:50.909Z
Keywords: Computer Science, Optical Character Recognition, Machine Learning, Deep Learning

Benaissa, Ali; Bahri, Abdelkhalek; El Allaoui, Ahmad

doi:10.17632/xp3hrmywfm.2

Multilingual Character Recognition Dataset for Moroccan Official Documents

Published: 1 November 2023| Version 2 | DOI: 10.17632/xp3hrmywfm.2

Contributors:

Ali Benaissa, Abdelkhalek Bahri, Ahmad El Allaoui

Description

the printed dataset with standard fonts of the characters that are used in Moroccan official documents are not available in internet with open-source license, specially Tifinagh and Arabic languages, which made us build new raw dataset, where we collected the most used fonts, then based on them we built 6 datasets; which is: Alphabet (contains the alphabet for a to z in lowercase and uppercase), digits (contains the numbers from 0 to 9), Arabic (contains the whole letters), Tifinagh (contains the all Tifinagh letters), French special characters such as “à, é, ç, è…” (contains the all special characters of French language), Symbols such as “?, !, (, )…”, in order to make a data augmentation, we generate more than one character with the same font.

Files

Institutions

Universite Abdelmalek Essaadi Faculte des Sciences et Techniques de Tanger
Universite Abdelmalek Essaadi

Multilingual Character Recognition Dataset for Moroccan Official Documents

Description

Files

Institutions

Categories

Licence