Inference from Limited Observations in Statistical, Dynamical, and Functional Problems
Author(s)
Stepaniants, George
DownloadThesis PDF (29.74Mb)
Advisor
Rigollet, Philippe
Dunkel, Jörn
Terms of use
Metadata
Show full item recordAbstract
Observational data in physics and the life sciences comes in many varieties. Broadly, we can divide datasets into cross-sectional data which record a set of observations at a given time, dynamical data which follow how observations change in time, and functional data which observe data points over a space (and possibly time) domain. In each setting, prior knowledge of statistical, dynamical systems, and physical theory allow us to constrain the inferences and predictions we make from observational data. This domain knowledge becomes of paramount importance when the data we observe is limited: due to missing labels, small sample sizes, unobserved variables, and noise corruption.
This thesis explores several problems in physics and the life sciences, where the interplay of domain knowledge with statistical theory and machine learning allows us to make inferences from such limited data. We begin in Part I by studying the problem of feature matching or dataset alignment which arises frequently when combining untargeted (unlabeled) biological datasets with low sample sizes. Leveraging the fast numerical methods of optimal transport, we develop an algorithm that gives a state-of-the-art solution to this alignment problem with optimal statistical guarantees. In Part II we study the problem of interpolating the dynamics of point clouds (e.g., cells, particles) given only a few sparse snapshot recordings. We show how tools from spline interpolation coupled with optimal transport give efficient algorithms returning smooth dynamically plausible interpolations. Part III of our thesis studies how dynamical equations of motion can be learned from time series recordings of dynamical systems when only partial observations of these systems are captured in time. Here we develop fast routines for gradient optimization and novel tools for model comparison to learn such physically interpretable models from incomplete time series data. Finally, in Part IV we address the problem of surrogate modeling, translating expensive solvers of partial differential equations for physics simulations into fast and easily-trainable machine learning algorithms. For linear PDEs, our prior knowledge of PDE theory and the statistical theory of kernel methods allows us to learn the Green’s functions of various linear PDEs, offering more efficient ways to simulate physical systems.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Mathematics; Massachusetts Institute of Technology. Institute for Data, Systems, and SocietyPublisher
Massachusetts Institute of Technology