MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

SymbolFit: Automatic Parametric Modeling with Symbolic Regression

Author(s)
Tsoi, Ho F.; Rankin, Dylan; Caillol, Cecile; Cranmer, Miles; Dasu, Sridhara; Duarte, Javier; Harris, Philip; Lipeles, Elliot; Loncar, Vladimir; ... Show more Show less
Thumbnail
Download41781_2025_Article_140.pdf (7.791Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
We introduce SymbolFit (API:  https://github.com/hftsoi/symbolfit ), a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we develop a framework that automates and streamlines the process by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without requiring a predefined functional form because the functional form itself is treated as a trainable parameter, making the process far more efficient and effortless than traditional regression methods. We demonstrate the framework in high-energy physics experiments at the CERN Large Hadron Collider (LHC) using five real proton-proton collision datasets from new physics searches, including background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We show that our framework can flexibly and efficiently generate a wide range of candidate functions that fit a nontrivial distribution well using a simple fit configuration that varies only by random seed, and that the same fit configuration, which defines a vast function space, can also be applied to distributions of different shapes, whereas achieving a comparable result with traditional methods would have required extensive manual effort.
Date issued
2025-07-01
URI
https://hdl.handle.net/1721.1/162760
Department
Massachusetts Institute of Technology. Department of Physics
Journal
Computing and Software for Big Science
Publisher
Springer International Publishing
Citation
Tsoi, H.F., Rankin, D., Caillol, C. et al. SymbolFit: Automatic Parametric Modeling with Symbolic Regression. Comput Softw Big Sci 9, 12 (2025).
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.