Policy-Based Access Control in Federated Clinical Question Answering

Chen, Alice

Author(s)

Chen, Alice

DownloadThesis PDF (1.534Mb)

Advisor

Kagal, Lalana

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Retrieval augmented generation (RAG) has recently expanded large language model versatility in answering domain-specific questions using dynamic external knowledge bases, particularly demonstrating promise in assisting clinical settings. However, due to its sensitive nature, patient medical data often requires retrieval to be federated across a decentralized network of hospital institutions, each maintaining internal databases and access control policies. Applying standard RAG to clinical question-answering tasks is complicated by the lack of an interface for hospital resource owners to regulate and restrict access to sensitive clinical documents during retrieval, which is essential for model feasibility in practice. We propose to leverage federated RAG retrieval for clinical trends inference across distributed medical records while adding authorization security mechanisms during retrieval to guarantee security of patient data. We propose (i) user identity authentication administered through a trusted federation of per-hospital OpenID Connect servers, (ii) a framework for integrating policy-based access control (PBAC) security mechanisms at flexible granularity into a federated RAG system to restrict medical data access based on user role attributes, and (iii) ClinicalTrendQA, a novel dataset to evaluate model performance for synthesizing clinical trends grounded on decentralized patient EHR information. To facilitate evaluation of our authorization PBAC framework on protecting information leakage during retrieval, we additionally present a federated 3-hospital case study and demonstrate that the same ClinicalTrendQA query under different user profiles holding varying degrees of access privileges observes the expected EHR information reduction. We also analyze metrics concerning the impact of this retrieval loss on end-to-end response quality against federated insecure and centralized RAG baselines.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156813

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses