Causal Inference Under Privacy Constraints
Author(s)
Yao, Leon
DownloadThesis PDF (3.947Mb)
Advisor
Eckles, Dean
Terms of use
Metadata
Show full item recordAbstract
Causal inference is an important tool for learning the effects of interventions in observational or experimental settings. It is widely used in many fields such as epidemiology, economics, and political science to find answers like the average treatment effect of a medical procedure or the individual treatment effect of a personalized ad campaign. In commercial applications, the era of big data allows companies to increase their experiment volume, incentivizing them, in turn, to collect more user data. On one hand, large volumes of data are necessary to train generative models like ChatGPT. At the same time, companies’ increasing use of user data has drawn heavy criticism and consumer backlash, incurring legitimate concerns about privacy and consent. As concerns over user data safety and privacy grow, rules and regulations like GDPR change what kinds of data companies and researchers can acquire and how they can analyze the data. The necessity of now performing causal inference under a range of privacy constrants has carved new spaces for research at the intersection of causal inference and privacy. In my thesis, I will be exploring three paradigms for protecting user data — data minimization, differential privacy and synthetic data — and how to perform causal inference techniques under these new privacy regimes.
Date issued
2025-02Department
Massachusetts Institute of Technology. Institute for Data, Systems, and SocietyPublisher
Massachusetts Institute of Technology