Machine Learning for Causal Estimation

Quintas-Martínez, Víctor M.

Author(s)

Quintas-Martínez, Víctor M.

DownloadThesis PDF (1.173Mb)

Advisor

Newey, Whitney

Chernozhukov, Victor

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The intersection of causal inference and machine learning (ML) has given rise to powerful tools for tackling complex empirical questions, especially in high-dimensional or highly nonlinear settings where traditional methods often fall short. This thesis develops and analyzes novel ML-based methods for estimating causal effects, with a focus on flexibility, robustness, and valid statistical inference. The first chapter addresses the challenge of regularization and model selection bias that arises when ML is used to estimate nuisance parameters. We propose a new framework for automatic debiased machine learning (DML), which we term Riesz regression. This approach constructs debiased estimating equations without requiring explicit characterizations of the debiasing terms, allowing for seamless integration with any ML algorithm. We extend the framework to generalized regressions, including high-dimensional generalized linear models (GLMs). To illustrate its practical value, we apply Riesz regression to a study of discrimination in lending, showing how neural networks can be leveraged for automatic debiasing. Monte Carlo simulations demonstrate that our method frequently outperforms conventional inverse propensity weighting approaches. The second chapter introduces a new method for causal change attribution, which quantifies how different causal mechanisms contribute to shifts in the distribution of an outcome variable over time or across groups. Building on a given causal model, our approach combines regression and re-weighting to identify and estimate the relevant counterfactual quantities. Our methodology is multiply robust, meaning it remains valid even when some components of the model are misspecified. We establish consistency and asymptotic normality. Moreover, we show how our algorithm can be embedded into popular attribution frameworks such as Shapley values, which then inherit its statistical guarantees. Simulation studies confirm the excellent performance of our method, and we demonstrate its utility through an applied case study. The third chapter tackles a common challenge in applied work: estimating and conducting inference on many related causal parameters, such as causal effects of many treatments or on multiple outcomes. We derive uniform error bounds and construct valid simultaneous confidence bands for collections of average treatment effects (ATEs) estimated via DML. Our framework accommodates both finite sets and continua of functionals, and leverages strong Gaussian approximation results to account for dependence across estimates. This enables rigorous simultaneous inference with control over familywise error rates. Together, these contributions advance the state of the art in machine learning for causal estimation by unifying flexible modeling with rigorous inferential theory. The methods developed are broadly applicable to problems in economics, public policy, healthcare, and beyond, where understanding causal relationships in complex, data-rich environments is essential. This thesis emphasizes practical applicability while maintaining strong theoretical guarantees, equipping researchers with tools to make credible, data-driven causal claims. JEL: C14, C21, C45

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162111

Department

Massachusetts Institute of Technology. Department of Economics

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses