Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias
Author(s)
Perets, Oriel; Stagno, Emanuela; Ben Yehuda, Eyal; McNichol, Megan; Celi, Leo; Rappoport, Nadav; Dorotic, Matilda; ... Show more Show less
Download3757924.pdf (2.114Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Biases inherent in electronic health records (EHRs), a common data source for training medical AI models, may exacerbate health inequities and hinder the adoption of ethical, responsible AI in healthcare. These biases originate from various sources, including implicit clinician biases, data collection and labeling practices, medical devices, and tools used for data processing. Such biases undermine data reliability, influence clinical decisions, and worsen healthcare disparities. When EHR data is used to develop data-driven solutions, biases can further propagate, creating systems that perpetuate inequities. This scoping review categorizes the primary sources of bias in EHRs. We conducted a literature search on PubMed and Web of Science (January 19, 2023) for English-language studies published between 2016 and 2023, following the PRISMA methodology. From 430 initial papers, 27 duplicates were removed, and 403 studies were screened for eligibility. After title, abstract, and full-text reviews, 116 articles were included in the final analysis. Existing studies often focus on isolated biases in EHRs but lack a comprehensive taxonomy. To address this gap, we propose a systematic classification framework encompassing six key sources of bias: (a) biases from prior clinical trials; (b) data-related biases, such as missing or incomplete information; (c) implicit clinician bias; (d) referral and admission bias; (e) diagnosis or risk disparity biases; and (f) biases in medical devices and algorithms. This taxonomy, outlined in Table 1, provides a foundation for evaluating and addressing these issues. While machine learning has transformative potential in healthcare, its effectiveness depends on the integrity of its inputs. Current evidence predominantly addresses data-related biases, with less attention to human or device-related biases, which are often anecdotal or underexplored. For example, racial biases in EHRs are well-documented, but gender-related, sexual orientation, and socially induced biases remain less studied. Compounding biases from these diverse sources can significantly impact AI recommendations, clinical decisions, and patient outcomes. Our review underscores the prevalence of data, human, and machine biases in healthcare and their role in amplifying disparities. To mitigate these challenges, we recommend adopting a ?bias-in-mind? approach when designing data-driven solutions, along with developing safeguards and generating more empirical evidence on bias impacts. This holistic understanding is essential for ensuring equitable and reliable AI applications in healthcare.
Date issued
2024-08-05Department
Institute for Medical Engineering and ScienceJournal
ACM Transactions on Intelligent Systems and Technology
Publisher
ACM
Citation
Oriel Perets, Emanuela Stagno, Eyal Ben Yehuda, Megan McNichol, Leo Anthony Celi, Nadav Rappoport, and Matilda Dorotic. 2025. Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias. ACM Trans. Intell. Syst. Technol. Just Accepted (August 2025).
Version: Final published version
ISSN
2157-6904