URL extraction results: testing on-premise LLMs on selected exploits

Name: URL extraction results: testing on-premise LLMs on selected exploits
Creator: Bartłomiej Płonkowski
Published: 2025-04-23T10:28:29.082Z
Keywords: Cybersecurity, Large Language Model

Płonkowski, Bartłomiej

doi:10.17632/gbtjt9nk36.2

URL extraction results: testing on-premise LLMs on selected exploits

Published: 23 April 2025| Version 2 | DOI: 10.17632/gbtjt9nk36.2

Contributor:

Bartłomiej Płonkowski

Description

This dataset presents URL extraction results from testing three on-premise Large Language Models (LLMs): Llama 3.1: 8B, Qwen 2.5 Coder: 7B Instruct, and Codestral: 22B. Each model has its directory containing subdirectories organized by CVE names, which in turn contain text files named after corresponding exploit IDs from ExploitDB or PacketStorm. These text files represent each model's output when tasked with extracting URLs that would be accessed during the execution of the exploit code, enabling direct reference to the original exploit samples.

Files

Steps to reproduce

This dataset was generated in September 2024, using the Ollama Python library on a system with 32GB of VRAM. Each of the three LLMs (Llama 3.1: 8B, Qwen 2.5 Coder: 7B Instruct, and Codestral:22B) was prompted to identify URLs that would be accessed during the execution of selected exploits code, with responses saved in an organized directory structure. All models were run with default parameters, without any custom configuration or fine-tuning.

Institutions

NASK Instytut Badawczy

URL extraction results: testing on-premise LLMs on selected exploits

Description

Files

Steps to reproduce

Institutions

Categories

Related Links

Licence