Show simple item record

dc.contributor.authorNgom, Amadou Latyr
dc.contributor.authorKraska, Tim
dc.date.accessioned2024-07-09T16:27:07Z
dc.date.available2024-07-09T16:27:07Z
dc.date.issued2024-06-09
dc.identifier.isbn979-8-4007-0680-6
dc.identifier.urihttps://hdl.handle.net/1721.1/155537
dc.descriptionaiDM ’24, June 14, 2024, Santiago, AA, Chileen_US
dc.description.abstractTranslating between the SQL dialects of different systems is important for migration and federated query processing. Existing approaches rely on hand-crafted translation rules, which tend to be incomplete and hard to maintain, especially as the number of dialects to translate increases. Thus, dialect translation remains a largely unsolved problem. To address this issue, we introduce Mallet, a system that leverages Large Language Models (LLMs) to automate the generation of SQL-to-SQL translation rules, namely schema conversion, automated UDF generation, extension selection, and expression composition. Once the rules are generated, they are infinitely reusable on new workloads without putting the LLM on the critical path of query execution. Mallet enhances the accuracy of the LLMs by (1) performing retrieval augmented generation (RAG) over system documentation and human expertise, (2) subjecting the rules to empirical validation using the actual SQL systems to detect hallucinations, and (3) automatically creating accurate few-shot learning instances. Contributors, without knowing the system's code, can improve Mallet by providing natural-language expertise for RAG.en_US
dc.publisherACMen_US
dc.relation.isversionof10.1145/3663742.3663973en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleMallet: SQL Dialect Translation with LLM Rule Generationen_US
dc.typeArticleen_US
dc.identifier.citationNgom, Amadou Latyr and Kraska, Tim. 2024. "Mallet: SQL Dialect Translation with LLM Rule Generation."
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2024-07-01T08:00:43Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-07-01T08:00:44Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record