Software Requirements Specifications (SRS) documents set the requirements and expectations of software development projects. This textual information is considered as a guideline for identifying various requirements for the developers. The content of an SRS document is concentrated around software requirement specific entities which can be used to improve software-based information retrieval systems. In this paper, we consider Named Entity Recognition (NER) over SRS documents and classify the software requirement specific entities using ML-based methods. Existing NER methods are typically restricted to four basic sets of entities that are not pertinent to NER in the software requirement domain. We define ten software requirement-based entities which cover the essence of the SRS documents. In order to predict the appropriate entity tags, we first create a limited feature set and experiment with various NER models including ML- based probabilistic models and DL models. We conduct a detailed numerical study to evaluate the effectiveness of the NER models over the SRS datasets. NER models with a basic feature set demonstrate promising performance with an F1-macro score of 81%. Our analysis shows that the entities can be extracted with high accuracy from the SRS documents, which then can be used for various purposes in practice such as extractive summarization of large SRS documents and effectively grouping the requirements.
Article ID: 2021S21
Publisher: Canadian Artificial Intelligence Association