Show simple item record

dc.contributor.advisorStonebraker, Michael R.
dc.contributor.authorChoi, Justin J.
dc.date.accessioned2025-09-18T14:30:08Z
dc.date.available2025-09-18T14:30:08Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:01:39.177Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162742
dc.description.abstractThis work examines the current state of using large language models (LLMs) to solve Text-to-SQL tasks on databases in an enterprise setting. Benchmarks on publicly available datasets do not fully capture the difficulty and complexity of this task in a real-world, enterprise setting. This study examines the critical steps needed to work with enterprise data as well as using knowledge-injection to enhance the performance of LLMs on Text-to-SQL tasks. We begin by evaluating the baseline performance of LLMs on enterprise databases, revealing that a predominant source of failure stems from a lack of domain-specific knowledge. To improve performance, we explore knowledge-injection: the process of incorporating internal and external knowledge. Internal knowledge consists of database-specific information such as join logic, while external knowledge refers to institutional acronyms or group names. We present a hybrid retrieval pipeline that combines embedding and text based searching with LLM-guided ranking to supply models with relevant external knowledge during Text-to-SQL generation. We evaluate the impact of the knowledge-injection by testing the performance of LLMs on the table retrieval task after being augmented with appropriate external knowledge. We demonstrate that knowledge-injection significantly improves accuracy on table retrieval using BEAVER: an enterprise-level Text-to-SQL benchmark. Our findings highlight the importance of domain-specific knowledge-injection and retrieval augmentation in bringing LLMs closer to deployment in enterprise-grade database systems, as well as common failure modes that occur when executing enterprise Text-to-SQL.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleInjection of Domain-Specific Knowledge for EnterpriseText-to-SQL
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record