Is There Variability in Scoring of Student Surgical OSCE Performance Based on Examiner Experience and Expertise?
Scripted videos of simulated student performance in an OSCE at two standards (clear pass and borderline) were prospectively awarded a global score based on two different rating scales by a range of clinical assessors. Results were analysed by examiner experience and content expertise.
Steps to reproduce
Participants entered demographic details and then watched two different scenarios each comprising a three minute video and two minute audio clip of an scripted actor playing a final year medical student taking a history from a simulated patient (also an actor) and then answering questions related to the clinical scenario (also scripted). The two scenarios differed by the standard displayed of the student portraying either a “clear pass” (scenario one, s1) or “borderline” (scenario two, s2) performance in eliciting the history and relevance of a common surgical presentation (rectal bleeding). Three experienced examiners with responsibility for the student curriculum, training in assessment and over 25 years of post-graduate experience in surgery and surgical education assessment provided scores for each scenario as the designated “Gold Standard”. At the completion of each scenario, participants were asked to score student performance by global rating using two scales commonly employed in the OSCE setting. The first was–a global five point ordinal ‘descriptor’ scale rating the student as one of: excellent/very good/clear pass/borderline/clear fail. The second was a continuous linear scale (letter grade) system from A+ to No grade, with each increment representing a 5% change in score. The letter scale did not have a description of performance at each grade but failing grades were indicated as E, F or G. Surgical specialists were defined as those in defined surgical training posts or qualified surgical consultants. Subspecialists were surgeons with at least five years of surgical training and a sub-speciality interest in general surgery. Data were also analysed according to grade of employment (non-consultant hospital doctor versus consultant). Study respondents were not given any advice on standards or how to grade students. Nor were they provided with specific training on marking or frames of reference prior to giving their assessment of two filmed OSCE scenarios. Hence content expertise was not confounded by external advice on standardisation but reflected the respondents’ personal views on standard setting. In the interest of reducing bias it was not disclosed that the medical student in the videos was an actor until ratings were completed (if participants were told upfront that the videos featured an actor, it follows that the videos were scripted and may have lead participants to consider what response the researchers were trying to elicit). The aim was not to stimulate particular participant responses but for participants to react as they would to the material presented, as if it were a medical student assessment.