Synthetic Arabic Data for Scene Text Recognition

Name: Synthetic Arabic Data for Scene Text Recognition
Creator: NIDDAL IMAM
Published: 2021-09-03T07:54:08.112Z
Keywords: Optical Character Recognition, Arabic Language, Recognition, Twitter

IMAM, NIDDAL

doi:10.17632/gfc32vndz8.2

Synthetic Arabic Data for Scene Text Recognition

Published: 3 September 2021| Version 2 | DOI: 10.17632/gfc32vndz8.2

Contributor:

NIDDAL IMAM

Description

The dataset consists of 50,000 cropped images with embedded Arabic text. The labels were generated from an Arabic words corpus, which consists of 15 thousand words. The second dataset was collected from Twitter Arabic hashtags and contins 100 cropped images. The datasets were used in our publuished paper "Detecting Spam Images with Embedded Arabic Text in Twitter".

Files

Steps to reproduce

Please refer to our published paper and this repo (https://github.com/Belval/TextRecognitionDataGenerator) for the steps.

Institutions

University of York

Synthetic Arabic Data for Scene Text Recognition

Description

Files

Steps to reproduce

Institutions

Categories

Licence