MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Injection of Domain-Specific Knowledge for EnterpriseText-to-SQL

Author(s)
Choi, Justin J.
Thumbnail
DownloadThesis PDF (739.5Kb)
Advisor
Stonebraker, Michael R.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
This work examines the current state of using large language models (LLMs) to solve Text-to-SQL tasks on databases in an enterprise setting. Benchmarks on publicly available datasets do not fully capture the difficulty and complexity of this task in a real-world, enterprise setting. This study examines the critical steps needed to work with enterprise data as well as using knowledge-injection to enhance the performance of LLMs on Text-to-SQL tasks. We begin by evaluating the baseline performance of LLMs on enterprise databases, revealing that a predominant source of failure stems from a lack of domain-specific knowledge. To improve performance, we explore knowledge-injection: the process of incorporating internal and external knowledge. Internal knowledge consists of database-specific information such as join logic, while external knowledge refers to institutional acronyms or group names. We present a hybrid retrieval pipeline that combines embedding and text based searching with LLM-guided ranking to supply models with relevant external knowledge during Text-to-SQL generation. We evaluate the impact of the knowledge-injection by testing the performance of LLMs on the table retrieval task after being augmented with appropriate external knowledge. We demonstrate that knowledge-injection significantly improves accuracy on table retrieval using BEAVER: an enterprise-level Text-to-SQL benchmark. Our findings highlight the importance of domain-specific knowledge-injection and retrieval augmentation in bringing LLMs closer to deployment in enterprise-grade database systems, as well as common failure modes that occur when executing enterprise Text-to-SQL.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162742
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.