Inference from Limited Observations in Statistical, Dynamical, and Functional Problems

Stepaniants, George

dc.contributor.advisor	Rigollet, Philippe
dc.contributor.advisor	Dunkel, Jörn
dc.contributor.author	Stepaniants, George
dc.date.accessioned	2024-08-01T19:03:48Z
dc.date.available	2024-08-01T19:03:48Z
dc.date.issued	2024-05
dc.date.submitted	2024-05-15T16:20:54.418Z
dc.identifier.uri	https://hdl.handle.net/1721.1/155887
dc.description.abstract	Observational data in physics and the life sciences comes in many varieties. Broadly, we can divide datasets into cross-sectional data which record a set of observations at a given time, dynamical data which follow how observations change in time, and functional data which observe data points over a space (and possibly time) domain. In each setting, prior knowledge of statistical, dynamical systems, and physical theory allow us to constrain the inferences and predictions we make from observational data. This domain knowledge becomes of paramount importance when the data we observe is limited: due to missing labels, small sample sizes, unobserved variables, and noise corruption. This thesis explores several problems in physics and the life sciences, where the interplay of domain knowledge with statistical theory and machine learning allows us to make inferences from such limited data. We begin in Part I by studying the problem of feature matching or dataset alignment which arises frequently when combining untargeted (unlabeled) biological datasets with low sample sizes. Leveraging the fast numerical methods of optimal transport, we develop an algorithm that gives a state-of-the-art solution to this alignment problem with optimal statistical guarantees. In Part II we study the problem of interpolating the dynamics of point clouds (e.g., cells, particles) given only a few sparse snapshot recordings. We show how tools from spline interpolation coupled with optimal transport give efficient algorithms returning smooth dynamically plausible interpolations. Part III of our thesis studies how dynamical equations of motion can be learned from time series recordings of dynamical systems when only partial observations of these systems are captured in time. Here we develop fast routines for gradient optimization and novel tools for model comparison to learn such physically interpretable models from incomplete time series data. Finally, in Part IV we address the problem of surrogate modeling, translating expensive solvers of partial differential equations for physics simulations into fast and easily-trainable machine learning algorithms. For linear PDEs, our prior knowledge of PDE theory and the statistical theory of kernel methods allows us to learn the Green’s functions of various linear PDEs, offering more efficient ways to simulate physical systems.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Inference from Limited Observations in Statistical, Dynamical, and Functional Problems
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Mathematics
dc.contributor.department	Massachusetts Institute of Technology. Institute for Data, Systems, and Society
dc.identifier.orcid	https://orcid.org/0000-0002-7834-7536
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

$Thumbnail$

Name:: stepaniants-gstepan-phd-math-2 ...
Size:: 29.74Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record