Show simple item record

dc.contributor.advisorRigollet, Philippe
dc.contributor.advisorDunkel, Jörn
dc.contributor.authorStepaniants, George
dc.date.accessioned2024-08-01T19:03:48Z
dc.date.available2024-08-01T19:03:48Z
dc.date.issued2024-05
dc.date.submitted2024-05-15T16:20:54.418Z
dc.identifier.urihttps://hdl.handle.net/1721.1/155887
dc.description.abstractObservational data in physics and the life sciences comes in many varieties. Broadly, we can divide datasets into cross-sectional data which record a set of observations at a given time, dynamical data which follow how observations change in time, and functional data which observe data points over a space (and possibly time) domain. In each setting, prior knowledge of statistical, dynamical systems, and physical theory allow us to constrain the inferences and predictions we make from observational data. This domain knowledge becomes of paramount importance when the data we observe is limited: due to missing labels, small sample sizes, unobserved variables, and noise corruption. This thesis explores several problems in physics and the life sciences, where the interplay of domain knowledge with statistical theory and machine learning allows us to make inferences from such limited data. We begin in Part I by studying the problem of feature matching or dataset alignment which arises frequently when combining untargeted (unlabeled) biological datasets with low sample sizes. Leveraging the fast numerical methods of optimal transport, we develop an algorithm that gives a state-of-the-art solution to this alignment problem with optimal statistical guarantees. In Part II we study the problem of interpolating the dynamics of point clouds (e.g., cells, particles) given only a few sparse snapshot recordings. We show how tools from spline interpolation coupled with optimal transport give efficient algorithms returning smooth dynamically plausible interpolations. Part III of our thesis studies how dynamical equations of motion can be learned from time series recordings of dynamical systems when only partial observations of these systems are captured in time. Here we develop fast routines for gradient optimization and novel tools for model comparison to learn such physically interpretable models from incomplete time series data. Finally, in Part IV we address the problem of surrogate modeling, translating expensive solvers of partial differential equations for physics simulations into fast and easily-trainable machine learning algorithms. For linear PDEs, our prior knowledge of PDE theory and the statistical theory of kernel methods allows us to learn the Green’s functions of various linear PDEs, offering more efficient ways to simulate physical systems.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleInference from Limited Observations in Statistical, Dynamical, and Functional Problems
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematics
dc.contributor.departmentMassachusetts Institute of Technology. Institute for Data, Systems, and Society
dc.identifier.orcidhttps://orcid.org/0000-0002-7834-7536
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record