Handwritten Lontara Numerals (0-9) Image Dataset

Published: 20 May 2024| Version 1 | DOI: 10.17632/8jh8nrzc55.1
Huzain Azis,


This dataset contains images of handwritten Lontara numerals ranging from 0 to 9. It comprises a total of 10890 samples, with 1089 images for each numeral class. The images were collected from various individuals to ensure diversity in handwriting styles. Key Features: Classes: 10 (Lontara numerals 0-9) Total Samples: 10890 Samples per Class: 1089 Image Format: Grayscale Data Collection and Labeling: The dataset was created by collecting handwritten numerals from participants with different handwriting styles. Each image was manually labeled to ensure accurate and consistent annotations. The data collection and labeling process was meticulously carried out by one of the authors. Usage: This dataset is suitable for training and testing machine learning models for handwritten numeral recognition. It can be used in various applications such as optical character recognition (OCR) systems, pattern recognition, and other related fields. Contributors: Author 1: Conducted the data collection and labeling process, ensuring accurate and consistent annotations for all samples. Author 2: Handled the data preprocessing, including image normalization and augmentation. Author 3: Developed the script for data collection and managed the overall project coordination. Author 4: Performed the quality check and validation of the dataset. Acknowledgments: We would like to thank all the participants who contributed their handwritten numerals for this dataset. License: CC BY NC 3.0 You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.


Steps to reproduce

1. Download the Dataset: Access the dataset from [Mendeley Data link] and download the ZIP file containing the images. 2.Extract the Files: Unzip the downloaded file to your desired directory. The directory structure should have separate folders for each numeral class (0-9). 3. Data Preprocessing: Convert the images to grayscale if they are not already in that format. Resize the images to a uniform size, for example, 28x28 pixels (adjust if different). Normalize the pixel values to the range [0, 1] if required by your machine learning model. 4. Load the Data: Use an image loading library such as OpenCV, PIL, or a deep learning framework (TensorFlow, PyTorch) to load the images into your workspace. Ensure that the labels are correctly assigned based on the folder names. 5. Split the Data: Divide the dataset into training, validation, and test sets. A common split is 70% training, 15% validation, and 15% testing. 6. Train a Model: Use a suitable machine learning model for handwritten digit recognition, such as a Convolutional Neural Network (CNN). Implement the model using a deep learning framework like TensorFlow or PyTorch. Train the model on the training set, validate it on the validation set, and test its performance on the test set. Evaluate the Model: Use metrics such as accuracy, precision, recall, and F1-score to evaluate the performance of your model. Perform error analysis to identify common misclassifications and improve the model accordingly. Reproduce the Results: Document the entire process, including data preprocessing steps, model architecture, hyperparameters, and evaluation metrics, to ensure reproducibility. Share your code and configuration files along with the dataset for others to replicate your results.


Universitas Muslim Indonesia


Computer Vision, Artificial Neural Network, Image Processing, Data Science, Optical Character Recognition, Handwriting Recognition, Machine Learning, Supervised Learning, Feature Extraction, Handwriting, Pattern Recognition, Convolutional Neural Network, Deep Learning