AI-powered Data Mining for the Development of Sustainable Concrete Materials
Author(s)
Duan, Yifei
DownloadThesis PDF (22.01Mb)
Advisor
Olivetti, Elsa A.
Terms of use
Metadata
Show full item recordAbstract
Data mining has become essential to contemporary industrial and scientific research, playing a pivotal role in uncovering insights from large-scale industrial datasets and literature collections. The sustainable transition of the concrete industry, a major contributor to global CO₂ emissions, demands both operational optimization and scientific innovation. This thesis presents comprehensive data mining frameworks for both industrial and literature source data to support the development of more sustainable concrete materials. Focusing on concrete manufacturing, we develop AI-powered methodologies tailored to real-world industrial data and complex scientific literature. For industrial data mining, we propose to incorporate interpretability and realistic engineering design scenarios to enhance the reliability of both predictive and prescriptive modeling of concrete mixes containing supplementary cementitious materials (SCMs). A domain-informed amortized Gaussian process and a shallow multi-layer perceptron (MLP) are shown to possess superior scientific consistency in predicting time-varied compressive strength, and time-invariant slump and air content properties, respectively. The explainable surrogate property models are applied in mix design optimization under a variety of realistic scenarios considering different engineering design requirements and SCM costs and densities. The importance of the comprehensive property constraint set is demonstrated in comparison against a baseline using only 28-day strength constraint which results in unreasonable property values. The necessity to differentiate realistic scenarios is also highlighted through the differences of optimized mixes and their production costs and climate impacts. Higher design strength, higher design slump, lower design air content, higher SCM density, and higher SCM unit cost can drive up the production costs. Though stratification patterns in the production costs of optimized mixes are observed across different scenarios, the mix-wise climate impacts are not clearly stratified, indicating that substantial emission reduction can be achieved without significantly increasing costs, regardless of the realistic scenarios. For literature mining, a novel method that finetunes lightweight large language models (LLMs) (pythia-2.8B) with multichoice instructions is developed. With the multifaceted linguistic complexity of communication within the domain rendering it infeasible to adopt the conventional named-entity-recognition approach, the new method successfully achieves great information inference accuracy in a time-, cost-, and computation-efficient manner, outperforming the GPT-3.5 in-context learning baseline by over 20%. A knowledge graph is constructed with the literature-mined data, offering insights to promote alternative material substitution strategies in concrete production as the current commercial SCMs are not comprehensively sustainable in the longer term. Statistical summary and temporal trend analyses are adopted to provide both static and dynamic insights into the research landscape. Although SCMs have remained a research hotspot, results revealed a systematic shift in recent studies from commercial SCMs to other materials. Geopolymer and fine aggregate studies have surged in the recent period, while clinker feedstock and filler studies have declined. A node similarity metric is modified to develop a model-free link prediction algorithm, enhanced with random graph perturbation for robustness and uncertainty quantification. Through link prediction, the currently underexplored lime-pozzolan cement application emerges as a potentially promising future research direction.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Institute for Data, Systems, and SocietyPublisher
Massachusetts Institute of Technology