Data extraction dataset for a systematic literature review on prompt-based attacks and defenses in Large Language Models

Name: Data extraction dataset for a systematic literature review on prompt-based attacks and defenses in Large Language Models
Creator: Victória Guimarães
Published: 2026-05-21T00:45:34.409Z
Keywords: Artificial Intelligence, Information Security, Systematic Review, Large Language Model

Guimarães, Victória; da Costa Medeiros, Débora; Reis, Leandro; Mesquita, Hugo; Rocha, Thiago

doi:10.17632/xm8ntk7cgd.1

Data extraction dataset for a systematic literature review on prompt-based attacks and defenses in Large Language Models

Published: 21 May 2026| Version 1 | DOI: 10.17632/xm8ntk7cgd.1

Contributors:

,

, Hugo Mesquita,

Description

This dataset supports a systematic literature review (SLR) on prompt-based attacks and defenses in Large Language Models (LLMs), following Kitchenham's guidelines and the PRISMA reporting framework. The dataset covers 89 primary studies published between 2023 and February 2026, retrieved from ArXiv, IEEE Xplore, and Scopus using the search string: ("large language model*" OR "LLMs") AND ("prompt injection" OR "jailbreak attack*" OR "prompt leak*" OR "malicious prompt*") AND (vulnerability OR mitigation). Files included: - data_extraction.csv: primary extraction spreadsheet with 89 studies and 31 fields covering attack techniques, defense strategies, threat models, evaluation metrics, and reproducibility indicators. - review_organization.csv: reviewer assignment sheet tracking Quality Assessment and Data Extraction completion per study and per reviewer. - data_extration_explore.ipynb: Jupyter notebook with exploratory data analysis of the extraction data. - README.md: full description of all fields and methodology.

Files

Steps to reproduce

1. Download data_extraction.csv and open with any spreadsheet editor or pandas (Python). 2. Each row corresponds to one primary study. Columns are described in README.md. 3. The Jupyter notebook data_extration_explore.ipynb reproduces the exploratory analyses. Run it with Python 3 and the libraries: pandas, numpy, matplotlib, seaborn. 4. The review_organization.csv documents reviewer assignments and completion status for quality assessment and data extraction tasks.

Data extraction dataset for a systematic literature review on prompt-based attacks and defenses in Large Language Models

Description

Files

Steps to reproduce

Categories

Licence