Chakma Language POS Tagging Dataset

Published: 5 October 2023| Version 1 | DOI: 10.17632/gc233nkjgk.1
Mushfiqur Rahman, Sanad Bhowmik


The Chakma Language POS Tagging Dataset is a valuable linguistic resource designed for the analysis and understanding of the Chakma language. Chakma is a member of the Indo-Aryan language family and is primarily spoken by the Chakma people in the Chittagong Hill Tracts region of Bangladesh and in parts of India and Myanmar. This dataset aims to facilitate research and development in Chakma language processing, particularly in the domain of Part-of-Speech (POS) tagging. Bengali: This column contains sentences and phrases in the Bengali script. Bengali is used for representing Chakma text in this dataset. Chakma (Character): In this column, Chakma words or characters are presented in their native script. Chakma script is an abugida script used for writing the Chakma language. Bengali (Chakma): This column provides a transliteration of Chakma words or characters into the Bengali script. It enables users who are familiar with Bengali to understand and work with the Chakma text more easily. Parts of Speech (POS): The Parts of Speech column contains POS tags assigned to each word or character in the Chakma language. POS tagging is a crucial linguistic task that assigns grammatical categories (e.g., noun, verb, adjective) to each word in a text, enabling syntactic and semantic analysis. Usage: Linguistic Analysis: Researchers and linguists can use this dataset for linguistic analysis, syntactic studies, and language documentation of the Chakma language. Natural Language Processing (NLP): NLP practitioners can leverage this dataset to build POS tagging models for Chakma, aiding in machine translation, sentiment analysis, and other NLP tasks. Language Preservation: This dataset contributes to the preservation and promotion of the Chakma language by making linguistic data available for analysis and development of language-related technologies. Data Sources: The dataset may have been compiled from various linguistic sources, native speakers, or linguistic experts with expertise in the Chakma language. Dataset Size: The Chakma Language POS Tagging Dataset comprises a total of 1156*4 data points, providing a substantial corpus of Chakma text for linguistic analysis and NLP research.



Natural Language Processing, Machine Learning, Cross-Cultural Linguistics, Linguistic Multidimensional Analysis, Asian Language