Accelerating the Design Process Through Natural Language Processing-based Idea Filtering
Author(s)
Edwards, Kristen M.![Thumbnail](/bitstream/handle/1721.1/147338/edwards-kme-sm-meche-2022-thesis.pdf.jpg?sequence=3&isAllowed=y)
DownloadThesis PDF (1.310Mb)
Advisor
Ahmed, Faez
Terms of use
Metadata
Show full item recordAbstract
The following treatise explores the use of natural language processing to accelerate the design process in various domains by automating idea filtering. During the design of products or programs, a bottleneck often arises when experts need to filter through an exorbitant number of ideas, searching for which ones are most innovative, creative, relevant, or any other number of subjective characteristics. We observe this bottleneck when filtering entrepreneurial ideas for innovation, when filtering earlystage design concepts for creativity and usefulness, and when filtering literature for relevance toward policy-informing evidence syntheses. Motivated by the common challenge of idea filtering in various design domains, my research explores the use of natural language processing (NLP) for accelerating design through automated idea filtering. My team and I investigate the possibility of using machine learning to predict expert-derived creativity assessments of design ideas from more accessible non-expert survey results. We demonstrate the ability of machine learning models to predict design metrics from the design itself and textual survey information. Our results show that incorporating NLP improves prediction results across design metrics, and that clear distinctions in the predictability of certain metrics exist. We go on to explore the effectiveness of using NLP to accelerate literature screening for designing evidence-based policies and programs. In this research, we introduce the use of transformer models for idea filtering and evaluation. Transformer-based models have produced state of the art results in NLP tasks such as language translation, question answering, reading comprehension, and sentiment analysis. Our results show that the fine-tunable transformer-based models achieve the highest text classification accuracy, 79%, therefore accurately evaluating and filtering our textual dataset. Furthermore, we observe that the model accuracy improves with the training data size with diminishing marginal effect. The findings can facilitate informed decision making regarding the trade-off between model accuracy and manual labeling efforts, increasing efficiency. After demonstrating the effectiveness of using NLP to accelerate literature screening, we aimed to next decrease the level of effort required of expert reviewers to generate training data. To train an idea-filtering model, we need a labeled dataset of ideas, however obtaining labeled data is a challenge for engineering and design applications, as experts are expensive and, therefore, expert labeled datasets are as well. We were motivated to explore avenues to decrease the size of training data needed by using active learning (AL). AL is the concept that a machine learning algorithm can perform better with less training data if it is allowed to choose the data from which it learns. We find that data selection techniques that incorporate active learning result in higher F1 scores, a more balanced training set, and fewer necessary labeled training instances. These results suggest that active learning is effective in decreasing expert level of effort for NLP-based idea filtering with highly imbalanced data. We ultimately find that NLP can accelerate design processes in various domains by automating idea filtering and further decreasing the level of effort required of human experts.
Date issued
2022-09Department
Massachusetts Institute of Technology. Department of Mechanical EngineeringPublisher
Massachusetts Institute of Technology