POS Tagging on Handwritten Sindhi Sentences

Name: POS Tagging on Handwritten Sindhi Sentences
Creator: maria Soomro
Published: 2024-12-30T09:14:33.389Z
Keywords: Computer Vision, Optical Character Recognition, Handwriting Recognition, Annotation, Natural Language Processing, Machine Learning, Artificial Intelligence Programming Language, First Language Use in Language Learning

Soomro, maria; Khalid, Muhammad

doi:10.17632/phk66sgmp5.1

POS Tagging on Handwritten Sindhi Sentences

Published: 30 December 2024| Version 1 | DOI: 10.17632/phk66sgmp5.1

Contributors:

,

Description

This dataset consists of high-resolution images of handwritten Sindhi sentences, meticulously curated for tasks such as Part-of-Speech (POS) tagging and Named Entity Recognition (NER). The dataset aims to facilitate research and development in natural language processing (NLP) and optical character recognition (OCR) for low-resource languages like Sindhi. Key Features: Language: Sindhi (script-based with unique linguistic characteristics). Dataset Size: Contains 1000+ labeled images with diverse handwriting styles. Annotations: Each image is manually annotated for POS tagging and NER tasks, ensuring high accuracy. Applications: Suitable for training and evaluating machine learning models in NLP, OCR, and language understanding. Diversity: Includes variations in sentence length, word structure, and handwriting styles to mimic real-world scenarios.

Files

Steps to reproduce

we have collected this dataset from various handwriting styles, from school students to university students. we told them to write the sentences as there wish then we captured those images and applied preprocessing steps to make this dataset.

Institutions

University of Sindh

POS Tagging on Handwritten Sindhi Sentences

Description

Files

Steps to reproduce

Institutions

Categories

Licence