Textual Requirements Dataset for NLP-Based Software Requirement Analysis

Published: 21 July 2025| Version 1 | DOI: 10.17632/x8yt85yfws.1
Contributor:
Sholiq Sholiq

Description

This dataset contains 68 examples of textual requirements collected from various software application modules, including cooperative, convenience store, and mini-hospital applications. Each entry includes the application name, module, process function, and a textual description of the system requirements. These requirements describe user interaction scenarios with the system in real-world operational contexts, such as patient registration, cash transactions, savings reports, member management, and sales. This dataset is useful for research and development in the fields of Natural Language Processing (NLP) and software engineering. It can be used for training NLP models, analyzing requirements quality, and testing requirements documentation automation tools. Furthermore, this dataset is suitable as a case study in requirements analysis learning and a benchmark for various academic experiments.

Files

Steps to reproduce

Dataset Development Methodology This dataset was developed using a four-phase process that reflects common practices in software requirements analysis: elicitation, transformation, documentation, and extraction. In the first phase, requirements elicitation, system analysts worked with users and domain experts to collect user stories—short narratives describing what users want from the system. For example: “As a warehouse clerk, I want to order goods from suppliers to maintain stock levels.” Data was gathered through interviews with staff (cashiers, warehouse clerks), observations of daily tasks to uncover hidden needs, and workshops to validate and prioritize user input. The second phase involved transforming these informal stories into formal requirements. Analysts outlined the sequence of user actions (e.g., selecting menus), defined system responses (e.g., displaying forms), and mapped out logical flows, including conditions (e.g., “if valid, save; else, return to form”). Each requirement was written in a structured English paragraph: starting with user initiation, followed by system behavior, conditions, and ending with the final system action. This format helped ensure clarity for both technical and non-technical stakeholders. The third phase was documentation in Software Requirements Specification (SRS) documents. These were organized by module (e.g., “Login,” “Cash Transactions”) and included technical details for developers while maintaining a readable, narrative style. A consistent structure ensured clarity and helped stakeholders navigate the system’s functional scope. The final phase was manual extraction. After documentation was complete, researchers reviewed the SRS to extract finalized requirements. A total of 68 requirements were selected from three systems, each representing a functional module. The text was kept intact to preserve linguistic and structural authenticity. The resulting dataset is a table with three columns: application name, module name, and the textual requirement in paragraph form. This approach reflects real-world documentation and enables reuse in software engineering, information systems, and computational linguistics. The methodology ensures both technical accuracy and contextual relevance.

Institutions

  • Institut Teknologi Sepuluh Nopember

Categories

Computer Science

Licence