Uzbek Medical Entity Benchmark
Description
This dataset introduces UzMedNER, a structured Named Entity Recognition (NER) resource for the Uzbek language in the medical domain. It is designed to support token-level sequence labeling tasks and facilitate research in low-resource biomedical NLP. The dataset consists of manually annotated Uzbek text where each token is labeled using a predefined tagset representing medical and related entity types. UzMedNER addresses the lack of: domain-specific annotated corpora in Uzbek standardized NER benchmarks for medical text resources for training sequence labeling models in low-resource settings
Files
Steps to reproduce
This dataset introduces UzMedNER, a structured Named Entity Recognition (NER) resource for the Uzbek language in the medical domain. It is designed to support token-level sequence labeling tasks and facilitate research in low-resource biomedical NLP. The dataset consists of manually annotated Uzbek text where each token is labeled using a predefined tagset representing medical and related entity types. UzMedNER addresses the lack of: domain-specific annotated corpora in Uzbek standardized NER benchmarks for medical text resources for training sequence labeling models in low-resource settings