Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias

Perets, Oriel; Stagno, Emanuela; Ben Yehuda, Eyal; McNichol, Megan; Celi, Leo; Rappoport, Nadav; Dorotic, Matilda

Author(s)

Perets, Oriel; Stagno, Emanuela; Ben Yehuda, Eyal; McNichol, Megan; Celi, Leo; ... Show more

Download3757924.pdf (2.114Mb)

Publisher with Creative Commons License

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

Biases inherent in electronic health records (EHRs), a common data source for training medical AI models, may exacerbate health inequities and hinder the adoption of ethical, responsible AI in healthcare. These biases originate from various sources, including implicit clinician biases, data collection and labeling practices, medical devices, and tools used for data processing. Such biases undermine data reliability, influence clinical decisions, and worsen healthcare disparities. When EHR data is used to develop data-driven solutions, biases can further propagate, creating systems that perpetuate inequities. This scoping review categorizes the primary sources of bias in EHRs. We conducted a literature search on PubMed and Web of Science (January 19, 2023) for English-language studies published between 2016 and 2023, following the PRISMA methodology. From 430 initial papers, 27 duplicates were removed, and 403 studies were screened for eligibility. After title, abstract, and full-text reviews, 116 articles were included in the final analysis. Existing studies often focus on isolated biases in EHRs but lack a comprehensive taxonomy. To address this gap, we propose a systematic classification framework encompassing six key sources of bias: (a) biases from prior clinical trials; (b) data-related biases, such as missing or incomplete information; (c) implicit clinician bias; (d) referral and admission bias; (e) diagnosis or risk disparity biases; and (f) biases in medical devices and algorithms. This taxonomy, outlined in Table 1, provides a foundation for evaluating and addressing these issues. While machine learning has transformative potential in healthcare, its effectiveness depends on the integrity of its inputs. Current evidence predominantly addresses data-related biases, with less attention to human or device-related biases, which are often anecdotal or underexplored. For example, racial biases in EHRs are well-documented, but gender-related, sexual orientation, and socially induced biases remain less studied. Compounding biases from these diverse sources can significantly impact AI recommendations, clinical decisions, and patient outcomes. Our review underscores the prevalence of data, human, and machine biases in healthcare and their role in amplifying disparities. To mitigate these challenges, we recommend adopting a ?bias-in-mind? approach when designing data-driven solutions, along with developing safeguards and generating more empirical evidence on bias impacts. This holistic understanding is essential for ensuring equitable and reliable AI applications in healthcare.

Date issued

2024-08-05

URI

https://hdl.handle.net/1721.1/162661

Department

Institute for Medical Engineering and Science

Journal

ACM Transactions on Intelligent Systems and Technology

Publisher

ACM

Citation

Oriel Perets, Emanuela Stagno, Eyal Ben Yehuda, Megan McNichol, Leo Anthony Celi, Nadav Rappoport, and Matilda Dorotic. 2025. Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias. ACM Trans. Intell. Syst. Technol. Just Accepted (August 2025).

Version: Final published version

ISSN

2157-6904

Collections

MIT Open Access Articles