MOBLLM: Model Building LLMs via Symbolic Regression and Experimental Design

Binbas, Berkin

dc.contributor.advisor	Englund, Dirk
dc.contributor.author	Binbas, Berkin
dc.date.accessioned	2025-08-27T14:30:27Z
dc.date.available	2025-08-27T14:30:27Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:01:01.821Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162509
dc.description.abstract	Large language models (LLMs) have recently emerged for daily use and have already been extensively utilized for various tasks. They are shown to be able to carry out more and more complex tasks every day, including those that require a high level of formal/mathematical reasoning at human or superhuman levels. In particular, their in-context learning capabilities and the domain-specific knowledge they have via their vast pretraining corpus, as well as their fine-tunability for specific tasks drove a lot of attention and research in the field. However, applications of LLMs to the frontiers of scientific research remains an underexplored direction. In this work, we investigate how one can leverage LLMs to aid with building compact mathematical models and experimental design. Specifically, we propose a framework for using LLMs as a guide to concurrently handle the experimental design and symbolic regression tasks for data obtained from 1) a black box 1D function and 2) a black box physical system. We propose further modifications to our base framework, and perform experiments to analyze how it performs under different experiment variants, across different LLM tiers. Our experiments reveal that while larger models (of around 70b parameters) do not always achieve better downstream performance compared to smaller models (of around 8b parameters), they are able to utilize the given information and/or physical context when designing experiments and proposing symbolic expressions, and perform better than random-design baselines. We also observe that natural language constraints do not consistently improve symbolic regression accuracy. These results underscore both the challenges and the potential of integrating LLM agents into the scientific discovery process, particularly as proposers of experiments and symbolic expressions.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	MOBLLM: Model Building LLMs via Symbolic Regression and Experimental Design
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	0009-0009-3237-6022
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: binbas-bbinbas-meng-eecs-2025- ...
Size:: 1.793Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record