Medical abbreviation and sense - Cardiology and General Medicine
Aim: The aim of my research is to understand the use of abbreviations found in electronic discharge summaries, and use it to construct features for automated detection, expansion, and disambiguation of abbreviations. Data source: The original data are Malaysian electronic discharge summaries written in English (as a second language). It was derived from Cardiology discharge summaries written by senior doctors (100 series), Cardiology discharge summaries written by junior doctors (200 series), and General Medicine discharge summaries written by junior doctors (300 series). The electronic discharge summaries were annotated for abbreviation and its senses. Data: The data provided is an extracted list of abbreviations and its senses. It lists the series, file names (representing individual discharge summaries), abbreviation IDs, the abbreviations, and senses.
Steps to reproduce
1. The data was derived from annotated electronic discharge summaries. A total of 1,102 discharge summaries were annotated by 11 recent medical graduates over three months using a web-based open-source annotation tool called Brat Rapid Annotation Tool (BRAT). Their task was to 1) detect and highlight the abbreviations found in the discharge summaries, and 2) give the sense of "meaning" or expansion of each abbreviation found. 2. The lists of abbreviations and its senses were exported into Microsoft Excel and cleaned. The abbreviations were checked for senses of equal meaning but with different ways of writing. For example, the abbreviation "DM2" may have senses written as "Type 2 Diabetes Mellitus" or "Diabetes Mellitus Type 2" or "Diabetes Mellitus Type II" and were all standardized to "Diabetes Mellitus Type 2”. The choice of sense, in this case, refers to the standard clinical terminology called SNOMED CT using the SNOMED CT browser (25). The sense was also checked for misspellings, and the spellings were standardized into UK English due to historical reason. 3. The abbreviations and its senses were all capitalized (case insensitive) for aggregation. Although this approach prevented us from analyzing the pattern of word capitalizations, the decision was based on annotator’s feedback regarding the inconsistent capitalization of abbreviations, including acronyms. The observation was tested on several observations from the 200 series (before data cleaning), which proved the inconsistent use of capitalization.