QnA Academic Chatbot
Description
The dataset is derived from university academic handbooks, which include information on academic policies, administrative procedures, and student services. It is highly relevant for building generative chatbots in the context of academic services. The dataset supports fine-tuning of Large Language Models (LLM) that enable fine-tuning large language models (such as GPT or BERT) to understand and generate responses according to the academic context. Educational institutions can use this dataset to develop responsive and efficient self-service systems for students and staff. Data acquisition. The contents of the Academic Guidebook are summarized and produce 18 lines of question-answer (Q&A) pairs. The Q&A pairs are then paraphrased with the following details: - Each question (Q) produces 10 paraphrased sentences (1 original sentence + 10 paraphrases) - Each answer (A) produces 5 paraphrased sentences (1 original sentence + 5 paraphrases) The dataset is converted into CSV format with question-answer (Q&A) pair structure and multi-turn conversation scenarios.
Files
Steps to reproduce
This dataset is a static collection of chat interactions between users and an academic information chatbot recorded during a pilot phase in early 2025. Since the chatbot system used for data collection is no longer active, full reproduction of the original data collection process is not possible. However, the dataset can be used, analyzed, or extended by researchers as follows: 1. Access the Dataset Download the dataset from the public repository below: https://github.com/bayusetiaji/ChatQnA Ensure that all files are present and match the documented structure. 2. Understand the Dataset Structure Review the included documentation (e.g., README, data dictionary) to understand the format and meaning of each field. The dataset is provided in CSV format, with each entry containing: a) session_id: Unique identifier for the conversation session b) timestamp: Date and time of the interaction c) user_query: Message input by the user d) chatbot_response: Response generated by the chatbot 3. Load the Dataset a) Use standard data processing tools (e.g., Python with pandas, JSON readers) to load and analyze the data. b) Example (Python): df = pd.read_csv("chatbot_dataset.csv", index=False) 4. Reproduce Analysis (Optional) a) If the dataset includes derived data or preprocessing steps, follow the provided scripts or notebooks to reproduce transformations or visualizations. b) Scripts (if available) are shared under the /scripts directory. 5. Extend or Reuse the Dataset a) Researchers may use this dataset for training dialogue systems, testing NLP models, or analyzing user intent patterns. b) For new data collection, a similar chatbot interface must be developed and deployed independently.