MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Advances in Symbolic Regression: From Generalized Formulation to Density Estimation and Inverse Problem

Author(s)
Tohme, Tony
Thumbnail
DownloadThesis PDF (9.768Mb)
Advisor
Youcef-Toumi, Kamal
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
In this thesis, we explore the field of Symbolic Regression (SR), a middle ground between simple linear regression and complex inscrutable black box regressors such as neural networks. In essence, SR searches the space of mathematical expressions to find a model that best captures the relationship between inputs and outputs of a given dataset. While SR has not gained mainstream popularity due to its computational intricacy and reliance on heuristics, its potential for generating explicit, concise, and interpretable mathematical models deserves further attention. This work presents a series of advancements in Symbolic Regression, extending its applicability and demonstrating its potential across diverse domains and problem settings. Initially, we introduce GSR, a Generalized Symbolic Regression method that redefines the traditional SR optimization problem to discover analytical mappings from the input space to a transformed output space. The proposed GSR approach achieves promising performance compared to existing SR methods across established benchmark datasets, as well as a more challenging dataset introduced in this study, called SymSet. Next, we delve into the task of recovering underlying partial differential equations (PDEs) from data through the use of the adjoint method. We begin by considering a family of parameterized PDEs encompassing linear, nonlinear, and spatial derivative candidate terms. We then formulate a PDE-constrained optimization problem aimed at minimizing the error of the PDE solution from data, and elegantly derive the corresponding adjoint equations. We showcase the efficacy of the proposed approach in selecting the appropriate candidate terms, thereby discovering the governing PDEs from data. We also compare its performance with a commonly employed method for PDE discovery. Furthermore, we introduce MESSY Estimation, a Maximum-Entropy based Stochastic and Symbolic densitY estimation method. The proposed approach infers probability density functions symbolically from samples by leveraging the Maximum Entropy Distribution (MED) principle. We uncover three key contributions: (i) the Lagrange multipliers, inherent in the MED ansatz, can be efficiently computed by simply solving a linear system of equations, (ii) the density recovery task is enhanced through matching more unconventional low-order (symbolic) moments, rather than necessarily matching higher-order (raw) moments, and (iii) the proposed symbolic density estimation framework leads to increased interpretability and better conditioning. 3Finally, we introduce ISR, an Invertible Symbolic Regression (ISR) approach, which bridges the concepts of SR and invertible maps. Specifically, ISR seamlessly combines the principles of Invertible Neural Networks (INNs) and Equation Learner (EQL), a neural network-based symbolic architecture for function learning. Demonstrating its versatility, ISR also serves as a symbolic normalizing flow for density estimation tasks. Additionally, we showcase its applicability in solving inverse problems, including a benchmark inverse kinematics problem, and notably, a geoacoustic inversion problem in oceanography aimed at inferring posterior distributions of underlying seabed parameters from acoustic signals. The diverse findings of this thesis not only contribute to advancing the field of Symbolic Regression, but also underscore its versatility and potential across various domains. A shift to explicit symbolic models, as demonstrated in this thesis, could unveil hidden patterns within the plethora of datasets available today, offering new insights and directions in the evolving field of machine learning and data analysis.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/155864
Department
Massachusetts Institute of Technology. Department of Mechanical Engineering; Massachusetts Institute of Technology. Center for Computational Science and Engineering
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.