A Framework for Semantic Textual Similarity Integration with Requirements and System Models

Beilstein, John R.

Author(s)

Beilstein, John R.

DownloadThesis PDF (13.54Mb)

Advisor

Rebentisch, Eric S.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Modern engineering projects can involve highly complicated systems with hundreds or even thousands of requirements. Organizing and managing these requirements is a task that falls on Systems Engineers (SEs) and Requirements Engineers (REs). This thesis seeks to better understand how Natural Language Processing can assist SEs and REs by identifying relationships and interactions between requirements. This thesis presents an algorithm that analyzes a requirements dataset and assigns requirements to various components defined in a system model. This system model represents an early concept design and consists of high-level components and the connections or relationships between these components. Components are defined with attributes such as names, descriptions, and synonyms. The algorithm uses semantic textual similarity to identify similarities between requirements and these component attributes to estimate which components of a system are affected by which requirement(s). The algorithm attempts to identify direct relationships between individual requirement statements using STS. Additionally, the algorithm attempts to identify indirect relationships between requirements by identifying requirements with overlapping influences on system model components. The initial results are promising, with the algorithm able to identify requirement-to-requirement pairings with high semantic textual similarity scores and can also identify multiple requirement statements that have high semantic textual similarity scores with overlapping parts of the system model. This information could be used to allow REs and SEs to better understand how different requirement statements directly or indirectly relate to and influence one another. This framework acts as an early proof of concept and more research is needed to understand its scalability. While not optimized, the proposed algorithm is able to reach F1 scores of 0.59 for matching requirements to individual components of the system model. While these F1 scores are not ideal, they imply this technique could be further refined to yield better results. It’s also worth noting that some of the matches between requirements and the system model would likely not be possible to categorize without a human’s intuition and engineering judgment, thus providing very challenging classifications for the algorithm. The algorithm achieves an overall precision between 0.94 and 1.00 for matching requirements to individual components of the system model at semantic textual similarity thresholds at or above 0.40.

Date issued

2024-02

URI

https://hdl.handle.net/1721.1/153990

Department

System Design and Management Program.

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses