Battle of the Bots: Solving Clinical Cases in Osteoarticular Infections with Large Language Models

Published: 2 May 2025| Version 1 | DOI: 10.17632/79tbbm7v24.1
Contributor:
Fabio Borgonovo

Description

This repository contains the three core datasets underpinning our LLM evaluation study in infectious disease decision‐making: Multiple‐Choice Answers: Model responses mapped to predefined, guideline‐based answer keys for each clinical question. Raw LLM Outputs: Unedited textual answers generated by all 15 tested language models. Likert‐Scale Ratings: Explanation‐quality scores assigned by two blinded, board‐certified reviewers, including consensus‐resolved discrepancies and interrater reliability statistics. Together, these files enable full replication of our accuracy and explanation‐quality analyses.

Files

Institutions

Mayo Clinic Rochester

Categories

Large Language Model

Licence