Evaluating the Accuracy of Large Language Models in Predicting ICD-10 and CPT Billing Codes for Outpatient Dermatology Notes

Published: 28 February 2025| Version 1 | DOI: 10.17632/npwtthtn62.1
Contributors:
Ryan Chen,
, William Nahm, Goranit Sakunchotpanit, Daniel Nguyen,
,
,

Description

1. Supplemental Figure I: A PDF compilation of test cases 1-8, including their ground truth ICD-10 and CPT codes. Each test case represents a dermatology patient encounter, detailing history, examination findings, assessment, and management plan. A corresponding table lists the associated ICD-10, CPT, and J codes for billing purposes​. 2. Supplemental Figure II: A heatmap visualizing model accuracy across various test cases. The color scale ranges from blue (highest accuracy) to red (lowest accuracy). Each row represents a model-case combination, while the columns correspond to different grading schemas. "+P" indicates scoring with punishment, while "-P" represents scoring without punishment​. 3. Supplemental Figure III: A graphical representation of under- and overprediction frequencies for CPT codes generated by each model. Overpredictions refer to CPT codes exceeding the board-certified dermatologist's ground truth, while underpredictions indicate missing or incomplete CPT code assignments. The figure displays total predictions across eight cases over five trials, with percentages calculated accordingly​.

Files

Institutions

Brigham and Women's Hospital

Categories

Dermatology, Billing, Chatbot, Large Language Model

Licence