dc.contributor.author | Huot, Mathieu | |
dc.contributor.author | Ghavami, Matin | |
dc.contributor.author | Lew, Alexander K. | |
dc.contributor.author | Schaechtle, Ulrich | |
dc.contributor.author | Freer, Cameron E. | |
dc.contributor.author | Shelby, Zane | |
dc.contributor.author | Rinard, Martin C. | |
dc.contributor.author | Saad, Feras A. | |
dc.contributor.author | Mansinghka, Vikash K. | |
dc.date.accessioned | 2024-07-08T18:59:10Z | |
dc.date.available | 2024-07-08T18:59:10Z | |
dc.date.issued | 2024-06-20 | |
dc.identifier.issn | 2475-1421 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/155514 | |
dc.description.abstract | We present GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL’s query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data. This enables using models written in a variety of probabilistic programming languages tailored to specific workflow. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. We formalize GenSQL using a novel type system and denotational semantics; together, these enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case studies, an anomaly detection in clinical trials and conditional synthetic data generation for a virtual wet lab, and show that GenSQL captures much more accurately the complexity of the data compared to GLM and CTGAN baselines. We show that GenSQL’s declarative syntax is more concise and less error-prone compared to several alternatives. Finally, GenSQL delivers a 1.7-6.8x speedup compared to its closest competitor on a representative benchmark set, and runs in comparable time to hand-written code, in part due to its reusable optimizations and code specialization. | en_US |
dc.publisher | Association for Computing Machinery | en_US |
dc.relation.isversionof | 10.1145/3656409 | en_US |
dc.rights | Creative Commons Attribution | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
dc.source | Association for Computing Machinery | en_US |
dc.title | GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Huot, Mathieu, Ghavami, Matin, Lew, Alexander K., Schaechtle, Ulrich, Freer, Cameron E. et al. 2024. "GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables." Proceedings of the ACM on Programming Languages, 8 (PLDI). | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.relation.journal | Proceedings of the ACM on Programming Languages | en_US |
dc.identifier.mitlicense | PUBLISHER_CC | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2024-07-01T07:58:54Z | |
dc.language.rfc3066 | en | |
dc.rights.holder | The author(s) | |
dspace.date.submission | 2024-07-01T07:58:55Z | |
mit.journal.volume | 8 | en_US |
mit.journal.issue | PLDI | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |