Replication Package: An Empirical Investigation of Large Language Models for Automated Software Compliance Testing: Evidence from Web Accessibility

Name: Replication Package: An Empirical Investigation of Large Language Models for Automated Software Compliance Testing: Evidence from Web Accessibility
Creator: Juan-Miguel López-Gil
Published: 2026-06-03T19:20:33.190Z
Keywords: Computer Science, Software, Artificial Intelligence Applications

López-Gil, Juan-Miguel; Pereira, Juanan

doi:10.17632/bwsyv45b9m.1

Replication Package: An Empirical Investigation of Large Language Models for Automated Software Compliance Testing: Evidence from Web Accessibility

Published: 3 June 2026| Version 1 | DOI: 10.17632/bwsyv45b9m.1

Contributors:

,

Description

This replication package accompanies the paper "An Empirical Investigation of Large Language Models for Automated Software Compliance Testing: Evidence from Web Accessibility" published in Information and Software Technology. Contents: - Test specifications for 384 W3C ACT test cases across 39 WCAG 2 AA success criteria - Evaluation scripts for seven Large Language Models (deepseek-reasoner, deepseek-chat, groq_qwen-qwq-32b, groq_deepseek-r1-distill-llama-70b, gemini-2.0-flash-thinking, groq_qwen-2.5-coder-32b, gemini-2.0-flash) - Three prompting strategies (Query 000, 110, 111) with five replications each - Raw LLM outputs for all 40,320 evaluations - Analysis code and derived datasets - Traditional tool evaluation results (axe-core, Lighthouse, Pa11y)

Replication Package: An Empirical Investigation of Large Language Models for Automated Software Compliance Testing: Evidence from Web Accessibility

Description

Files

Institutions

Categories

Funders

Licence