A Topology-Guided Diffusion Process for Synthetic Tabular Data Generation
Author(s)
Cheng, Emily
DownloadThesis PDF (2.231Mb)
Advisor
Farias, Vivek F.
Terms of use
Metadata
Show full item recordAbstract
Synthesizing realistic tabular data is crucial for any analytical application, including policy evaluation related to household energy use. However, detailed household-level consumption data, necessary for such evaluation, are scare at fine geographic scales, as public surveys like the U.S. Residential Energy Consumption Survey (RECS) provide too few observations. We address this gap by developing a topology-guided diffusion-based generative model that produces realistic synthetic household data, and our approach handles two key challenges in this setting: (1) mixed continuous and discrete features and (2) strong hierarchical dependencies among variables. To handle categorical features, we build upon recent advancements in discrete diffusion, particularly TabDDPM [1] and TabDiff [2], which discretize the diffusion process through noise transition matrices, effectively extending diffusion methods to discrete tabular domains. To address hierarchical dependence, we include (1) a structure-aware noise schedule that injects noise from the leaves to the root along an approximate Chow–Liu tree constructed from the variables and (ii) a masked self-attention denoiser that aligns with the same graphical structure. Extensive experiments show that our structured diffusion model outperforms the baseline TabDiff on data with tree-like dependencies, due to the inductive bias from our structure-aware noise schedule. On data that only approximately follows a tree, such as the RECS dataset, our model maintains competitive performance, only slightly outperforming standard diffusion methods. These results highlight the potential for future work to further optimize the tradeoff between structural approximation and estimation accuracy and for future work beyond the energy domain.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology