Causal Graph Summarization
Author(s)
Zeng, Anna
DownloadThesis PDF (1.007Mb)
Advisor
Cafarella, Michael
Terms of use
Metadata
Show full item recordAbstract
Causal inference is critical for scientific progress, especially in social sciences like public health and education—however, analysts often only have access to partial data which may lead to erroneous conclusions if critical confounding biases are not accounted for. To do this, they critically rely on (often unavailable or incomplete) domain knowledge to identify attributes to include for causal analysis, which is often tediously, manually specified in the form of a causal DAG.
Given state-of-the-art methods, analysts might automatically gather and causally organize a much more comprehensive set of attributes to include in their analysis; however, at best, such tools provide large, nearly-complete causal graphs which are difficult to comprehend, let alone verify to use in causal analysis tasks; as these graphs get bigger and denser with the growth of automated causal discovery methods, domain experts will struggle to comprehend, interpret, and correct causal graphs for practical applications. Existing methods for graph summarization developed in other domains, such as graphics, social networking, and mapping, are not guaranteed to provide a summarized graph eligible for use in causal analysis tasks; some methods even result in introducing spurious causal relationships that render erroneous conclusions if used in causal analysis.
We hypothesize that causality-specific graph summarization algorithms could surmount these challenges. To demonstrate this, we introduce CAMBA, a prototype causal graph summarization algorithm that efficiently generates high-quality causal graph summaries that are interpretable and usable for causal inference. In this thesis, we formalize the Causal DAG Summarization problem, identify a causal information metric, extend causal inference foundations to summary graphs, identify graph summarization techniques which can preserve this causal information, and propose a range of possible causal-specific graph summarization optimizations, and evaluate such methods on a range of causal analysis scenarios.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology