Show simple item record

dc.contributor.authorHuot, Mathieu
dc.contributor.authorGhavami, Matin
dc.contributor.authorLew, Alexander K.
dc.contributor.authorSchaechtle, Ulrich
dc.contributor.authorFreer, Cameron E.
dc.contributor.authorShelby, Zane
dc.contributor.authorRinard, Martin C.
dc.contributor.authorSaad, Feras A.
dc.contributor.authorMansinghka, Vikash K.
dc.date.accessioned2024-07-08T18:59:10Z
dc.date.available2024-07-08T18:59:10Z
dc.date.issued2024-06-20
dc.identifier.issn2475-1421
dc.identifier.urihttps://hdl.handle.net/1721.1/155514
dc.description.abstractWe present GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL’s query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data. This enables using models written in a variety of probabilistic programming languages tailored to specific workflow. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. We formalize GenSQL using a novel type system and denotational semantics; together, these enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case studies, an anomaly detection in clinical trials and conditional synthetic data generation for a virtual wet lab, and show that GenSQL captures much more accurately the complexity of the data compared to GLM and CTGAN baselines. We show that GenSQL’s declarative syntax is more concise and less error-prone compared to several alternatives. Finally, GenSQL delivers a 1.7-6.8x speedup compared to its closest competitor on a representative benchmark set, and runs in comparable time to hand-written code, in part due to its reusable optimizations and code specialization.en_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.isversionof10.1145/3656409en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleGenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tablesen_US
dc.typeArticleen_US
dc.identifier.citationHuot, Mathieu, Ghavami, Matin, Lew, Alexander K., Schaechtle, Ulrich, Freer, Cameron E. et al. 2024. "GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables." Proceedings of the ACM on Programming Languages, 8 (PLDI).
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalProceedings of the ACM on Programming Languagesen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2024-07-01T07:58:54Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-07-01T07:58:55Z
mit.journal.volume8en_US
mit.journal.issuePLDIen_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record