Empirical Data Package for "Quality Assessment of Software Requirements Using Artificial Intelligence Methods: A Systematic Literature Review"
Description
This dataset, referred to as the "Empirical Data Package", contains all systematically extracted data for the following study: E. Wolf, A. Trendowicz, J. Siebert, "Quality Assessment of Software Requirements Using Artificial Intelligence Methods: A Systematic Literature Review". Research objective: The study investigates how AI techniques and models are applied to assess and improve the quality of software requirements. It addresses three research questions: - RQ1: Which AI methods and models are used for automated assessment of requirements quality? - RQ2: Which quality aspects and associated metrics are considered for assessment? - RQ3: Which datasets are used to create, evaluate, and adjust requirements quality assessment models? Dataset contents: - Screening logs, including title-abstract review, snowballing results, and the final set of 26 primary studies. - Demographic information such as type of contribution and research context (industry vs. academia). - Quality assessment data per study, including applied criteria and scoring. - Extracted mappings for each study: AI methods, evaluation metrics, quality aspects, purposes of AI application, RE phases targeted, and datasets used. - Organized per research question (RQ1–RQ3) in Excel worksheets with grouping and content tabs. Notable findings included in the dataset: - Requirement quality aspects from multiple sources can largely be mapped to the INVEST framework. - AI models primarily focus on detecting issues; few provide actionable improvements. - AI methods rarely cover all INVEST aspects, with some quality criteria neglected. - The field exhibits heterogeneity in datasets, labeling strategies, and evaluation metrics, highlighting challenges for reproducibility, generalization, and adoption in practice. - Recent AI approaches, including GenAI models, offer opportunities for explanation and recommendations, but require careful adaptation by RE practitioners. Purpose: The dataset supports transparency, reproducibility, and further analysis of AI-based methods for software requirements quality assessment and can serve as a foundation for future benchmark datasets and standardized evaluation frameworks.
Files
Steps to reproduce
The dataset in this Empirical Data Package was generated following a systematic workflow based on guidelines for performing systematic literature reviews in software engineering (Kitchenham, 2007). The steps are described in chronological order: 1. Automated Literature Search We executed structured queries on SCOPUS and ScienceDirect, covering multiple digital libraries (IEEE Xplore, SpringerLink, ACM, ACL, AIS). Keywords were derived from the research questions (RQs), and constraints included publication type, date range (Jan 2019 – Mar 2025), language (English), and subject area (Computer Science). 2. Deduplication From the initial 353 retrieved publications, duplicates were removed, resulting in 334 unique studies. 3. Application of Inclusion Criteria (IC1–IC6) The inclusion criteria covered publication period, peer review status, primary research, language, length (>3 pages), and relevance to the RQs. Title, abstract, and full-text screening sequentially reduced the set to 24 studies eligible for quality assessment. 4. Quality Assessment The selected studies were evaluated using adapted SLR quality criteria. Each paper was scored on all applicable criteria, and studies were classified as high, medium, or low quality. All studies above the 40% threshold were retained. 5. Snowballing Forward and backward snowballing was performed on the 24 selected studies to identify additional relevant publications. All snowball-identified papers underwent the same screening and quality assessment process as the initial set. Two additional papers passed these steps, resulting in a final corpus of 26 studies. 6. Demographic Analysis We collected study metadata such as type of empirical study and research context (industry vs. academia) to support interpretation and reproducibility. 7. Data Extraction Data were systematically extracted from each study into tabular form, including: - Named quality aspects and definitions - Basis for quality aspects (standards/guidelines) - Purpose of quality aspects (assess, explain, predict, improve) and RE phases of application (elicitation, analysis, modeling, validation, management) - Evaluation metrics for each quality aspect - Datasets used, with information on size, structure, origin, availability, and labeling - AI methods (NLP, ML, DL) applied and their effectiveness for the study objectives 8. Dataset Organization The Excel-based dataset contains grouping worksheets (overview of sub-tabs) and content worksheets corresponding to each research question (RQ1–RQ3). This structure allows traceability from extracted data back to original studies. 9. Tools and Instruments SCOPUS and ScienceDirect for automated search and forward snowballing Google Scholar for backward snowballing Manual screening, quality assessment, and data extraction in Excel
Institutions
- Fraunhofer-Institut fur Experimentelles Software Engineering