Published: 7 June 2024| Version 1 | DOI: 10.17632/4ysx9fyzv4.1
Sonali Sonali, Srinivasarao Thamada


This dataset aims to support the requirement engineering phase of software development, in the early phase of requirement analysis of Functional Requirements (FR) and Non-Functional Requirements (NFR) for cross-cutting concerns and to improve modularity. The dataset will contain a collection of software requirements expressed in natural language. Categories: Each requirement is categorized as either FR or NFR. Subcategories (Optional): NFRs can be further categorized based on specific quality aspects like security, performance, usability, etc. The dataset will highlight how certain requirements impact multiple modules or functionalities within the system. The dataset can be used to train Natural Language Processing (NLP) tools to automatically analyze requirements annotated as FR or NFR. Sources for the dataset are existing software requirements repositories like PROMISE and open-source project documentation with FRs and NFRs. By processing existing datasets and containing open software project requirement documents we have a dataset of 6118 requirements where 3964 requirements are functional and 2154 requirements are nonfunctional However, creating a comprehensive dataset that covers a wide range of software systems and requirements is challenging as natural language requirements can be ambiguous.


Steps to reproduce

From the PURE repository, we have extracted requirements text mainly from web-based and mobile-based software systems requirement specifications. We have also processed online software requirement specification documents. Building this dataset involves two key steps manually sifting through Software Requirements Specifications (SRS) to identify individual requirements and then assigning each one a label indicating whether it's functional or non-functional. Some SRS documents clearly outline the requirements, making them easy to extract. However, not all documents explicitly state the requirement type. When this happens, we carefully examine the requirements themselves to assign the appropriate label (FR,NFR). By processing natural language requirements from PURE dataset and online open software requirement documents we have dataset containing 6118 requirements where 3964 requirements are functional and 2154 requirements are non-functional. Each entry in the dataset consists of two parts: the requirement text description and a category label that defines its nature (functional or non-functional).


GITAM University, Bharati Vidyapeeth University


Requirement Engineering, Natural Language Processing, Machine Learning, Software Development, Requirements Management