Mallet: SQL Dialect Translation with LLM Rule Generation
Author(s)
Ngom, Amadou Latyr; Kraska, Tim
Download3663742.3663973.pdf (933.5Kb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
Translating between the SQL dialects of different systems is important for migration and federated query processing. Existing approaches rely on hand-crafted translation rules, which tend to be incomplete and hard to maintain, especially as the number of dialects to translate increases. Thus, dialect translation remains a largely unsolved problem.
To address this issue, we introduce Mallet, a system that leverages Large Language Models (LLMs) to automate the generation of SQL-to-SQL translation rules, namely schema conversion, automated UDF generation, extension selection, and expression composition. Once the rules are generated, they are infinitely reusable on new workloads without putting the LLM on the critical path of query execution. Mallet enhances the accuracy of the LLMs by (1) performing retrieval augmented generation (RAG) over system documentation and human expertise, (2) subjecting the rules to empirical validation using the actual SQL systems to detect hallucinations, and (3) automatically creating accurate few-shot learning instances. Contributors, without knowing the system's code, can improve Mallet by providing natural-language expertise for RAG.
Description
aiDM ’24, June 14, 2024, Santiago, AA, Chile
Date issued
2024-06-09Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryPublisher
ACM
Citation
Ngom, Amadou Latyr and Kraska, Tim. 2024. "Mallet: SQL Dialect Translation with LLM Rule Generation."
Version: Final published version
ISBN
979-8-4007-0680-6
Collections
The following license files are associated with this item: