MOBLLM: Model Building LLMs via Symbolic Regression and Experimental Design
Author(s)
Binbas, Berkin
DownloadThesis PDF (1.793Mb)
Advisor
Englund, Dirk
Terms of use
Metadata
Show full item recordAbstract
Large language models (LLMs) have recently emerged for daily use and have already been extensively utilized for various tasks. They are shown to be able to carry out more and more complex tasks every day, including those that require a high level of formal/mathematical reasoning at human or superhuman levels. In particular, their in-context learning capabilities and the domain-specific knowledge they have via their vast pretraining corpus, as well as their fine-tunability for specific tasks drove a lot of attention and research in the field. However, applications of LLMs to the frontiers of scientific research remains an underexplored direction. In this work, we investigate how one can leverage LLMs to aid with building compact mathematical models and experimental design. Specifically, we propose a framework for using LLMs as a guide to concurrently handle the experimental design and symbolic regression tasks for data obtained from 1) a black box 1D function and 2) a black box physical system. We propose further modifications to our base framework, and perform experiments to analyze how it performs under different experiment variants, across different LLM tiers. Our experiments reveal that while larger models (of around 70b parameters) do not always achieve better downstream performance compared to smaller models (of around 8b parameters), they are able to utilize the given information and/or physical context when designing experiments and proposing symbolic expressions, and perform better than random-design baselines. We also observe that natural language constraints do not consistently improve symbolic regression accuracy. These results underscore both the challenges and the potential of integrating LLM agents into the scientific discovery process, particularly as proposers of experiments and symbolic expressions.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology