Synthetic Indian Clinical Notes for Natural Language Processing Research
Description
This dataset contains 10,000 synthetic clinical notes designed to reflect the unique linguistic and structural characteristics of clinical documentation in India. The notes were generated using a custom Python script that leverages a variety of templates and semantic patterns derived from real-world examples of Indian clinical documentation. The dataset is intended for researchers and developers working on Natural Language Processing (NLP) tasks in the medical domain, particularly those focused on the Indian context. It can be used for training and evaluating language models, developing information extraction systems, and exploring the semantic nuances of Indian clinical text. The dataset includes notes from five different medical specialties: cardiology, neurology, pediatrics, general medicine, and orthopaedics. The notes are available in a medium length, providing a balance of detail and conciseness.
Files
Steps to reproduce
RUN THE PYTHON CODE ATTACHED.
Institutions
- Punjab Technical UniversityPunjab, Kapurthala Town