A Unified Multi-Source Dataset of Software Requirements for Automated Validation Using LLMs

Published: 18 May 2026| Version 1 | DOI: 10.17632/g4nh7vcfyb.1
Contributors:
,
,

Description

This dataset offers a comprehensive and structured collection of software requirements designed to support research in requirements engineering, natural language processing (NLP), and automated requirement classification. It contains 8,854 individual natural‑language requirement statements gathered from 165 software projects across 317 distinct application domains. These domains range from event management platforms and real estate applications to dispute management and scheduling systems. Each requirement is annotated with a corresponding classification label, enabling supervised learning, empirical studies, and comparative analyses. The dataset covers 25 requirement categories, including high‑level classes such as Security Requirements (SR) and Non‑Security Requirements (NSR). It encompasses both functional and non-functional perspectives, with major groups such as Functional Requirements (FR), Non-Functional Requirements (NFR), and Non-Software Requirements (NSR). More granular categories include usability, availability, performance efficiency, maintainability, security, and interface‑related requirements. These detailed annotations enable fine-grained analysis and support the development of automated techniques for requirement identification, classification, and quality assessment. This dataset is well suited for tasks such as requirement classification, text mining, information extraction, and machine learning model training within the domain of requirements engineering. Its diversity of application domains and rich labeling scheme make it valuable for benchmarking models and conducting experiments in NLP‑based software engineering research. Ultimately, the dataset aims to promote reproducible research and advance the creation of intelligent tools for requirement analysis, validation, and management.

Files

Institutions

Categories

Artificial Intelligence, Software Engineering

Licence