Show simple item record

dc.contributor.authorTsoi, Ho F.
dc.contributor.authorRankin, Dylan
dc.contributor.authorCaillol, Cecile
dc.contributor.authorCranmer, Miles
dc.contributor.authorDasu, Sridhara
dc.contributor.authorDuarte, Javier
dc.contributor.authorHarris, Philip
dc.contributor.authorLipeles, Elliot
dc.contributor.authorLoncar, Vladimir
dc.date.accessioned2025-09-19T17:03:45Z
dc.date.available2025-09-19T17:03:45Z
dc.date.issued2025-07-01
dc.identifier.urihttps://hdl.handle.net/1721.1/162760
dc.description.abstractWe introduce SymbolFit (API:  https://github.com/hftsoi/symbolfit ), a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we develop a framework that automates and streamlines the process by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without requiring a predefined functional form because the functional form itself is treated as a trainable parameter, making the process far more efficient and effortless than traditional regression methods. We demonstrate the framework in high-energy physics experiments at the CERN Large Hadron Collider (LHC) using five real proton-proton collision datasets from new physics searches, including background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We show that our framework can flexibly and efficiently generate a wide range of candidate functions that fit a nontrivial distribution well using a simple fit configuration that varies only by random seed, and that the same fit configuration, which defines a vast function space, can also be applied to distributions of different shapes, whereas achieving a comparable result with traditional methods would have required extensive manual effort.en_US
dc.publisherSpringer International Publishingen_US
dc.relation.isversionofhttps://doi.org/10.1007/s41781-025-00140-9en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer International Publishingen_US
dc.titleSymbolFit: Automatic Parametric Modeling with Symbolic Regressionen_US
dc.typeArticleen_US
dc.identifier.citationTsoi, H.F., Rankin, D., Caillol, C. et al. SymbolFit: Automatic Parametric Modeling with Symbolic Regression. Comput Softw Big Sci 9, 12 (2025).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Physicsen_US
dc.relation.journalComputing and Software for Big Scienceen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-07-18T15:35:12Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.embargo.termsN
dspace.date.submission2025-07-18T15:35:12Z
mit.journal.volume9en_US
mit.journal.issue12en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record