dc.contributor.author | Ngom, Amadou Latyr | |
dc.contributor.author | Kraska, Tim | |
dc.date.accessioned | 2024-07-09T16:27:07Z | |
dc.date.available | 2024-07-09T16:27:07Z | |
dc.date.issued | 2024-06-09 | |
dc.identifier.isbn | 979-8-4007-0680-6 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/155537 | |
dc.description | aiDM ’24, June 14, 2024, Santiago, AA, Chile | en_US |
dc.description.abstract | Translating between the SQL dialects of different systems is important for migration and federated query processing. Existing approaches rely on hand-crafted translation rules, which tend to be incomplete and hard to maintain, especially as the number of dialects to translate increases. Thus, dialect translation remains a largely unsolved problem.
To address this issue, we introduce Mallet, a system that leverages Large Language Models (LLMs) to automate the generation of SQL-to-SQL translation rules, namely schema conversion, automated UDF generation, extension selection, and expression composition. Once the rules are generated, they are infinitely reusable on new workloads without putting the LLM on the critical path of query execution. Mallet enhances the accuracy of the LLMs by (1) performing retrieval augmented generation (RAG) over system documentation and human expertise, (2) subjecting the rules to empirical validation using the actual SQL systems to detect hallucinations, and (3) automatically creating accurate few-shot learning instances. Contributors, without knowing the system's code, can improve Mallet by providing natural-language expertise for RAG. | en_US |
dc.publisher | ACM | en_US |
dc.relation.isversionof | 10.1145/3663742.3663973 | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | Association for Computing Machinery | en_US |
dc.title | Mallet: SQL Dialect Translation with LLM Rule Generation | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Ngom, Amadou Latyr and Kraska, Tim. 2024. "Mallet: SQL Dialect Translation with LLM Rule Generation." | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.identifier.mitlicense | PUBLISHER_CC | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2024-07-01T08:00:43Z | |
dc.language.rfc3066 | en | |
dc.rights.holder | The author(s) | |
dspace.date.submission | 2024-07-01T08:00:44Z | |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |