Students suspicious behaviors detection dataset for AI-powered online exam proctoring

Published: 8 July 2025| Version 1 | DOI: 10.17632/39xs8th543.1
Contributors:
,

Description

Our research hypothesizes that student cheating during online exams can be accurately detected through multimodal analysis of visual behavioral cues captured via standard webcams. By combining facial movements, hand gestures, gaze tracking, head pose, and phone interaction data, AI-based proctoring systems can identify dishonest behavior. To validate this, we developed this dataset, specifically designed to support the training, testing, and benchmarking of machine learning models for automated and scalable online exam proctoring. What the Data Shows The dataset consists of 5,500 structured records, each representing a snapshot of a student’s behavior during an online exam. Each record includes 38 attributes extracted using computer vision techniques and classified into two categories [see Table 1]: • Cheating behavior (label = 1) • Non-cheating behavior (label = 0) The class distribution is nearly balanced, with 2,619 cheating and 2,881 non-cheating instances, making it suitable for supervised binary classification tasks. The recorded features fall under the following categories: • Face Detection: Captures face presence, count, bounding box, and key landmarks. • Hand Tracking: Records hand count, positions, and object interaction status. • Head Pose Estimation: Includes pitch, yaw, and roll angles indicating head orientation. • Mobile Phone Detection: Indicates phone presence, location, and detection confidence. • Eye Gaze Tracking: Tracks gaze direction, screen focus, gaze points, and pupil positions. How the Data Was Gathered Data were collected in a controlled, simulated online exam environment using a standard webcam and implemented with computer vision modules. The system used: • MediaPipe for real-time face and hand tracking. • OpenCV for image processing and frame analysis. • Custom models for gaze estimation, head pose, and mobile phone detection. Notable Findings Machine learning models like Random Forest and XGBoost achieved high precision and recall on this dataset. Notably: • Hand-object interactions and phone presence are key indicators of cheating. • Head pose deviations and off-screen gaze also suggest suspicious behavior. • Combining multiple behavioral cues enhances detection accuracy over single-modality approaches. How the Data Can Be Interpreted and Used This dataset is designed for researchers, developers, and educators aiming to: • Build AI-powered online proctoring systems • Develop behavior recognition models for academic monitoring • Benchmark cheating detection techniques in machine learning and computer vision • Explore the ethical implications of surveillance technologies in education Each record is fully anonymized, containing no raw images or personal identifiers, making it safe for public research use. The structured numerical format ensures compatibility with various machine learning libraries and tools.

Files

Steps to reproduce

To support AI-driven online exam proctoring, we developed a reproducible, multimodal dataset capturing students’ behavioral cues under simulated examination conditions. The entire workflow was implemented using Python-based open-source libraries and operated through a webcam interface. The key stages and protocols are outlined below: 1. Data Acquisition Setup • Webcam: Standard HD (720p) webcam used. • Frame Rate: 20 frames per second. • Software Interface: Custom web application built with Django for live video capture during simulated exam sessions. • Environment: Controlled indoor setting with consistent ambient lighting. 2. Participants and Simulation Protocol • Participants: 30+ volunteers (balanced by age and gender) engaged in exam simulations lasting several minutes. • Behavioral Scripts: o Normal: Typing, reading, and brief glances at the screen. o Suspicious: Phone usage, gaze aversion, whispering, looking off-screen, or another person entering the frame. • Ethical Compliance: o Informed consent was obtained from all participants. o Privacy safeguards included: - No storage of raw images or video footage. - Only anonymized numerical and categorical features were retained. - Facial regions were blurred or masked during post-processing. 3. Image Preprocessing • Tools: Python, OpenCV. • Steps: o Frame resizing and normalization. o BGR to RGB conversion (as needed). o Face alignment and noise reduction to ensure reliable feature extraction. 4. Feature Extraction Modules Each captured frame was processed through five core modules using Python libraries and pre-trained models [refer to Table 2]: (i) Face Detection (ii) Hand Tracking (iii) Head Pose Estimation (iv) Mobile Phone Detection (v) Eye Gaze Tracking 5. Data Structuring and Storage • Each video frame was transformed into a structured tabular record (CSV format). • A total of 37 features per row, combining outputs from all modules. • Behavioral labels (Non-Cheating or Cheating) were manually annotated based on observation logs and session context. • The dataset was validated using Pandas for consistency, completeness, and labeling accuracy, with selected samples manually reviewed. 6. Tools and Reproducibility • Software Stack: o Python, OpenCV, MediaPipe, PyTorch, Dlib. o YOLOv5 or YOLOv7 frameworks for object detection. • Hardware: Standard laptop with 8 GB RAM and an integrated HD webcam. • Reproduction Steps: (a) Set up a similar webcam-based environment using Python and the listed libraries. (b) Replicate the modular architecture for feature extraction. (c) Follow the real-time frame processing pipeline. (d) Save structured outputs in CSV format. (e) Manually annotate behaviors as needed.

Institutions

Chittagong University of Engineering and Technology, Jahangirnagar University

Categories

Artificial Intelligence Applications, Online Learning, Academic Assessment, Cheater Detection, Classification (Machine Learning), Online Education, Student Behavior, Extreme Gradient Boosting

Funding

University Grants Commission of Bangladesh

UGC PhD Fellowship 2022-2023

Licence