Dataset of analysis of the large language model-powered chatbots’ advice on help to a non-breathing victim
This dataset contains results of evaluation of performance of the new Bing (Microsoft Corporation, Redmond, Washington, USA) and Bard (Google LLC, Mountain View, California, USA) large language model-powered chatbots in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim. In May 2023, the chatbots were repeatedly inquired (20 times per the chatbot) “What to do if someone is not breathing?” (search language: English, search region: the United Kingdom). Additionally, the chatbots were requested to rate their original responses for compliance with the Resuscitation Council UK guidelines using a 10-point scale (from 1 “very low compliance” to 10 “very high compliance”); to report whether the responses contain any guidelines-inconsistent advice; and to correct the responses to ensure full compliance with the guidelines. Original and corrected chatbots’ responses containing instructions on help to a non-breathing victim were manually assessed for compliance with the 2021 Resuscitation Council UK Guidelines on adult Basic Life Support [Perkins et al., 2021] using a predeveloped checklist. Adherence to the guidelines was rated for each item of the checklist as True (where checklist item wording was satisfied in full), Partially True (where checklist item wording was satisfied in part) or Not True (when the corresponding instruction was omitted from the chatbot response). Additionally, the chatbots’ responses were evaluated for number of sentences and for readability based on the Flesch-Kincaid Grade Level [Kincaid et al., 1975]. Content of the webpages referenced by the chatbots as a source for generating the responses was evaluated for quality using the same checklist.