neuroGPT-X: Towards an Accountable Expert Opinion Tool for Vestibular Schwannoma

Published: 27 February 2023| Version 1 | DOI: 10.17632/b9mck42r35.1
Contributors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Description

We hypothesize that a well-trained, context-enriched GPT will perform at the level of or better than an expert surgeon in generating comprehensive answers to questions surrounding commonly posed in day-to-day practice regarding vestibular schwannoma. In this study, we make three key contributions to assessing the feasibility of LLMs as a clinical decision-making adjunct. 1. We develop a framework to context-enrich GPT with context relevant to vestibular schwannoma. 2. We compare the performance of ChatGPT (Jan. 30, 2023 model) and a context-enriched GPT model against leading neurosurgical experts worldwide, evaluating the ability of large language models (LLMs) to assist in clinical decision-making. 3. We introduce a proof-of-concept clinical decision-making tool, neuroGPT-X, which incorporates working memory, sources with each answer, and a web-based chat platform to address challenges in using LLMs in a clinical setting, including interpretability, reliability, accountability, and safety.

Files

Steps to reproduce

The data includes (1) data acquisition to build a database of relevant PubMed and Wikipedia articles, (2) data processing and embeddings for semantic searching of relevant articles, (3) survey responses and evaluations for questions posed to ChatGPT, context-enriched GPT, and 4 expert neurosurgeons, (4) data analysis from the survey results, and (5) code to build a chatbot interface (neuroGPT-X) using the Python Flask microweb framework. The directory structure of the data and description of important files within the "Final Data" directory is as follows: code_noapi - abstracts: code and data for PubMed abstracts and Wikipedia articles pulled using web scraping - flaskapp: code to create the neuroGPT-X website - processing: code and data for building a dataset (vs_scrape.ipynb, embedding_model_final_NEW.ipynb), dataset thematic analysis (clustering.ipynb), creating embeddings (embedding_model_final_NEW.ipynb), and answering questions (embedding_model_final_NEW.ipynb) evaluation_analysis - contains evaluation results from 3 neurosurgeon judges and code for analysis - complete_imputed.csv: imputed values using the mode for judge 2 - complete_noimpute.csv: raw data combining all 3 judge evaluations - impute.ipynb: Python notebook that computes imputed values using the mode for judge 2 - agreement_analysis.ipynb: Python notebook that computes various metrics for inter-rater agreement - updated_agreement.ipynb: Python notebook that computes Krippendorff alpha and Fleiss kappa for inter-rater agreement - unblinded.csv: unblinded affective survey results figures - contains image files for all figures in the paper "neuroGPT-X: Towards an Accountable Expert Opinion Tool for Vestibular Schwannoma" neuro_website_output - downloaded website that shows an example of a Q&A conversation between neuroGPT-X and a human neurosurgeon_responses - answers to 15 questions curated by a neurosurgeon by 4 neurosurgeon experts timing_analysis - code and data for how fast neurosurgeon experts, ChatGPT, and context-enriched GPT takes to answer questions

Institutions

Weill Cornell Medicine Department of Surgery, Universitat Wien, The University of British Columbia, Dalhousie University Department of Surgery, University of Calgary, Harvard Medical School, Hokkaido Daigaku Byoin, Universitatsklinikum Tubingen

Categories

Neurosurgery, Natural Language Processing, Machine Learning, Acoustic Neuroma, Language Modeling

Licence