Handwritten Arabic Numerals (0-9) Image Dataset

Published: 20 May 2024| Version 1 | DOI: 10.17632/5hpkf8v7bg.1
Contributors:
Huzain Azis, Intan Novita Saly

Description

This dataset contains images of handwritten Arabic numerals ranging from 0 to 9. It comprises a total of 9350 samples, with 935 images for each numeral class. The images were collected from various individuals to ensure diversity in handwriting styles. Key Features: Classes: 10 (Arabic numerals 0-9) Total Samples: 9350 Samples per Class: 935 Image Format: Grayscale Image Size: 28x28 pixels (adjust if different) Data Collection and Labeling: The dataset was created by collecting handwritten numerals from participants with different handwriting styles. Each image was manually labeled to ensure accurate and consistent annotations. The data collection and labeling process was meticulously carried out by one of the authors. Usage: This dataset is suitable for training and testing machine learning models for handwritten digit recognition. It can be used in various applications such as optical character recognition (OCR) systems, pattern recognition, and other related fields. Contributors: Author 1: Conducted the data collection and labeling process, ensuring accurate and consistent annotations for all samples. Author 2: Handled the data labelling process. Acknowledgments: We would like to thank all the participants who contributed their handwritten numerals for this dataset. License: CC BY NC 3.0 You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.

Files

Steps to reproduce

1. Download the Dataset: Access the dataset and download the ZIP file containing the images. 2.Extract the Files: Unzip the downloaded file to your desired directory. The directory structure should have separate folders for each numeral class (0-9). 3.Data Preprocessing: Convert the images to grayscale if they are not already in that format. Resize the images to a uniform size, for example, 28x28 pixels (adjust if different). Normalize the pixel values to the range [0, 1] if required by your machine learning model. 4.Load the Data: Use an image loading library such as OpenCV, PIL, or a deep learning framework (TensorFlow, PyTorch) to load the images into your workspace. Ensure that the labels are correctly assigned based on the folder names. 5.Split the Data: Divide the dataset into training, validation, and test sets. A common split is 70% training, 15% validation, and 15% testing. 6.Train a Model: Use a suitable machine learning model for handwritten digit recognition, such as a Convolutional Neural Network (CNN). Implement the model using a deep learning framework like TensorFlow or PyTorch. Train the model on the training set, validate it on the validation set, and test its performance on the test set. 7. Evaluate the Model: Use metrics such as accuracy, precision, recall, and F1-score to evaluate the performance of your model. Perform error analysis to identify common misclassifications and improve the model accordingly. Reproduce the Results: Document the entire process, including data preprocessing steps, model architecture, hyperparameters, and evaluation metrics, to ensure reproducibility. Share your code and configuration files along with the dataset for others to replicate your results.

Institutions

Universitas Muslim Indonesia

Categories

Computer Vision, Artificial Neural Network, Image Processing, Data Science, Optical Character Recognition, Handwriting Recognition, Machine Learning, Supervised Learning, Feature Extraction, Handwriting, Image Classification, Arabic Language, Convolutional Neural Network, Deep Learning

Licence