Broken Access Control Detection Dataset (BAC-ML-1M)

Published: 25 April 2025| Version 2 | DOI: 10.17632/vvr4w36bn6.2
Contributors:
,
,
,
,

Description

The dataset was built manually through Python script simulations which tracked access control activities on three security compromised web applications named DVWA (Damn Vulnerable Web Application), WebGoat and OWASP Juice Shop. Building an extensive dataset served as the main goal because researchers needed it for training and evaluating machine learning systems that detect Broken Access Control (BAC) attacks in real-time. The script conducted valid access requests and counterfeit attempts to circumvent access controls through automated procedures. The script collected user requests while tagging them by anticipated permission results, vulnerability classification (IDOR, Forced Browsing), and simulated monitoring detection outputs. The established labelling system enables researchers to conduct supervised as well as unsupervised learning experiments in cybersecurity fields. The final version contains 1 million records, which include the following fields: 1. User roles and session metadata 2. Requested resources and access methods 3. Access outcomes (expected vs. granted) 4. Attack payloads and vulnerability types 5. Anomaly and risk scores 6. Binary attack detection labels This dataset supports: 1. The development and benchmarking of intrusion detection and prevention systems serve as the main functionalities of this dataset. 2. Evaluation of real-time access control enforcement techniques 3. Organizations can use Role-based access violation profiling combined with behavioural analytics for their systems. 4. Security education, red team simulation, and vulnerability research 5. The testing of anomaly detection systems, along with access pattern deviation systems, utilizes benchmarking as a method

Files

Steps to reproduce

Reproduce the creation of the Broken Access Control detection dataset: 1. Install DVWA, WebGoat, and OWASP Juice Shop in a controlled environment (Docker, VMs). 2. Simulate Multiple User Roles and Exploit tried: Run Python scripts that provide different user roles (i.e., admin, user, guest) and try to exploit broken access control through URL manipulation, session fixation, IDOR… 3. Log the HTTP requests and responses, including session token, URL, status code, success, and failure of each exploit attempt using tools such as Burp Suite, OWASP ZAP. 4. Cleaning Collected Data: Collect data, remove sensitive information, and classify interactions as ‘exploited’ or ‘non-exploited’ based on the success of the attack. 5. Publish to Machine: Make the data available for a new machine to absorb and use in the meantime.

Institutions

  • Symbiosis International University Symbiosis Institute of Technology

Categories

Computer Science, Cybersecurity

Licence