DriVQA: A Gaze-Based Dataset for Visual Question Answering in Driving Scenarios
Description
DriVQA is a novel dataset that combines gaze plots and heatmaps with visual question-answering (VQA) data from participants who were presented with driving scenarios. Visual Questioning Answering (VQA) is proposed as a part of the vehicle autonomy trustworthiness and interpretability solution. They are recently being explored in the context of autonomous driving to enhance the understanding of the environment through visual inputs and enable more intelligent decision-making by the autonomous vehicle. Collected using the Tobii Pro X3-120 eye-tracking device, the DriVQA dataset provides a comprehensive mapping of where participants direct their gaze when presented with images of driving scenes, followed by related questions and answers from every participant. The DriVQA dataset contains five key elements for each scenario: images of driving situations, associated questions, participant answers, gaze plots, and heatmaps. Each gaze plot represents the exact points of focus and their sequence on the driving images, with the size of these exact points illustrating the length of attention, while the heatmaps illustrate the number of gaze points and their durations in various areas of the scene. DriVQA is being used to study the subjectivity inherent in VQA. Its detailed gaze-tracking data offers a unique perspective on how individuals perceive and interpret visual scenes, making it an essential resource for training VQA models that rely on human-like attention. The dataset is a valuable tool for investigating human cognition and behaviour in dynamic, real-world scenarios. DriVQA is highly relevant for VQA models, as it allows the systems to learn from human-like attention behaviour when making decisions based on visual input when trained. The gaze data has the potential to guide VQA models in selecting the most relevant regions of an image for answering specific questions, much like a human would focus on key areas of a driving scene. The dataset has the potential to drive advancements in VQA research and development by improving the safety and intelligence of driving systems through enhanced visual understanding and interaction. DriVQA has significant potential for reuse in various research areas, including the development of advanced VQA models, attention analysis, and human-computer interaction studies. Its comprehensive gaze plots and heatmaps can also be leveraged to improve applications in autonomous driving, driver assistance systems, and cognitive science research, making it a versatile resource for both academic and industrial purposes.
Files
Steps to reproduce
A comprehensive list of everything available in the dataset: i. Empty questionnaire (Questions.pdf): Contains the images and list of questions that the participants were asked ii. Participants’ answers (Responses.xlsx): Contains the responses from every participant in a separate sheet labelled with their respective participant IDs. iii. Gazeplots (Folder ‘Gaze Plot’): Each gaze plot represents the exact points of focus and their sequence on the driving images, with the size of these exact points illustrating the length of attention. These are available in the folder named ‘Gaze plot’ for every participant in both Test 1 and Test 2 folders. iv. Heatmaps (Folders ‘Heatmap’): Available in the folders named ‘Heatmap’ in both Test 1 and Test 2 folders, heatmaps are used to illustrate the number of gaze points and their durations in various areas of the scene. v. Driving experience (Experience.pdf): Contains the years of driving experience for every participant in the experiment with their corresponding participant IDs. vi. Images used with camera information (Folder ‘Images’): Contains all the images used for the experiment in their respective camera folders (For eg: Folder ‘Back camera’ has 4 images, etc.).
Institutions
Categories
Funding
Science Foundation Ireland
18/CRT/6049