Uzbek Medical Entity Benchmark

Published: 15 April 2026| Version 3 | DOI: 10.17632/77jwmshbcp.3
Contributors:
,

Description

This dataset introduces UzMedNER, a structured Named Entity Recognition (NER) resource for the Uzbek language in the medical domain. It is designed to support token-level sequence labeling tasks and facilitate research in low-resource biomedical NLP. The dataset consists of manually annotated Uzbek text where each token is labeled using a predefined tagset representing medical and related entity types. UzMedNER addresses the lack of: domain-specific annotated corpora in Uzbek standardized NER benchmarks for medical text resources for training sequence labeling models in low-resource settings

Files

Steps to reproduce

This dataset introduces UzMedNER, a structured Named Entity Recognition (NER) resource for the Uzbek language in the medical domain. It is designed to support token-level sequence labeling tasks and facilitate research in low-resource biomedical NLP. The dataset consists of manually annotated Uzbek text where each token is labeled using a predefined tagset representing medical and related entity types. UzMedNER addresses the lack of: domain-specific annotated corpora in Uzbek standardized NER benchmarks for medical text resources for training sequence labeling models in low-resource settings

Categories

Consultation in Healthcare

Licence