Battle of the Bots: Solving Clinical Cases in Osteoarticular Infections with Large Language Models
Published: 2 May 2025| Version 1 | DOI: 10.17632/79tbbm7v24.1
Contributor:
Fabio BorgonovoDescription
This repository contains the three core datasets underpinning our LLM evaluation study in infectious disease decision‐making: Multiple‐Choice Answers: Model responses mapped to predefined, guideline‐based answer keys for each clinical question. Raw LLM Outputs: Unedited textual answers generated by all 15 tested language models. Likert‐Scale Ratings: Explanation‐quality scores assigned by two blinded, board‐certified reviewers, including consensus‐resolved discrepancies and interrater reliability statistics. Together, these files enable full replication of our accuracy and explanation‐quality analyses.
Files
Institutions
Mayo Clinic Rochester
Categories
Large Language Model