GPTRC- GPT Responses Corpus

Published: 20 September 2023| Version 1 | DOI: 10.17632/wfmj9bknd7.1
, Ashutosh Kumar Dubey,


Numerous researchers have shown significant interest in researching and enhancing artificially generated content since the emergence of generative pre-trained transformer models. However, due to the limited availability of extensive datasets, many conducted studies struggle to assess their accuracy. The GPTRC dataset aims to address this issue by furnishing researchers with a substantial collection of responses generated by a GPT model. This dataset comprises 50,134 responses generated in response to questions posed by human users. These responses are produced using OpenAI's publicly accessible GPT-3.5-turbo model, which is currently powering ChatGPT. The questions included in the dataset are sourced from various corners of the internet and various search engines. These questions span a wide spectrum, covering everyday inquiries to intricate puzzles, devoid of any specific industry or subject focus. This diversity enhances the dataset's value, serving as a valuable resource for researchers interested in investigating the efficiency and accuracy of contemporary state-of-the-art artificial intelligence models. Additionally, the dataset can potentially be employed to assess different models' ability to generate responses when provided with an equivalent amount of information, among other potential use cases. Beyond questions and responses, the dataset also encompasses metadata associated with each record. This metadata is generated to assist researchers in filtering or organizing the dataset to align with their specific needs. Key attributes associated with each dataset record include response time and thinking time. The response time attribute denotes the total duration taken by the model to receive the query and produce the complete response, while the thinking time attribute represents the time the model took to process the query before commencing response generation. The combination of these two attributes, along with the query and the response itself, equips researchers with the tools necessary to conduct comprehensive conversation analysis.



Artificial Intelligence Applications, Information Searching, Multilingualism, User Experience Evaluation, Natural-Language Understanding