RTPB2026: RedTeam Prompt Base 2026
Description
The RedTeam Prompt Base 2026 (RTPB 2026) is a specialized corpus designed for red teaming exercises and security evaluations in Large Language Models. Its primary goal is to provide researchers and developers with a structured dataset of adversarial prompts, jailbreak techniques, alignment evasion tactics, and role manipulation to stress-test models and improve their security mechanisms. The data is sourced through a hybrid approach, combining open-source community collection from public repositories and forums dedicated to vulnerability discovery, alongside internal creation of synthetic and manual prompts to test zero-day attack vectors or highly contextual scenarios. Each entry in the dataset is structured as a JSON object containing a unique string identifier, the exact adversarial prompt text, an array of categorization types such as roleplay or logic bypass, an array of target models, the source URL or internal origin, and an ISO 8601 timestamp of when it was logged. This dataset is strictly intended for robustness evaluation to measure attack success rates, guardrail development for training classifiers, and safety fine-tuning processes like RLHF or DPO. Due to the nature of red teaming, RTPB 2026 contains inherently unsafe material, including explicit language, offensive content, and instructions for bypassing ethical constraints. It must be used exclusively for defensive security research to standardize vulnerability evaluation, rather than to encourage the exploitation of artificial intelligence systems. Acknowledgment: This work was carried out within the framework of the Spain Living Lab project, funded by the European Union, NextGenerationEU, through the Recovery, Transformation and Resilience Plan (PRTR), Component 16, as part of the Territorial Networks for Technological Specialization Program. The project is coordinated by the Canary Islands Agency for Research, Innovation and Information Society of the Regional Ministry of Universities, Science, Innovation and Culture of the Government of the Canary Islands.
Files
Steps to reproduce
To reproduce this dataset, begin by systematically scraping public forums, open-source repositories, and communities dedicated to AI red teaming, such as specialized subreddits for jailbreaks, to collect community-discovered adversarial inputs. Extract the raw prompt text, identify the intended target model, and record the source URL for each collected entry. Supplement this open-source data by manually crafting or synthetically generating internal prompts tailored to test specific alignment vulnerabilities or contextual scenarios. Once the raw prompts are gathered, standardize the data by formatting each entry into the required JSON structure. This involves generating a unique hash identifier for the entry, assigning relevant categorization tags based on the attack strategy like roleplay or logic bypass, and appending an ISO 8601 timestamp to mark the exact moment the data was logged. Finally, compile all the formatted JSON objects into a unified file to complete the dataset.
Institutions
- Universidad de La LagunaCanary Islands, San Cristóbal de La Laguna