I. Background and Objectives for the Systematic Review
Healthcare algorithms are frequently used to guide clinical decision making both at the point of care and as part of resource allocation and healthcare management. For the purposes of this review, algorithms are defined as mathematical formulas and models that combine different variables or factors to inform a calculation or an estimate – frequently an estimate of risk. Algorithms are often incorporated into healthcare decision tools, such as clinical guidelines, pathways, clinical decision support programs in electronic health records (EHRs), and operational systems used by health systems and payers. End-users, such as healthcare providers, integrated delivery systems, payers, and consumers, use algorithms for at least six broad purposes: screening; risk prediction; diagnosis; prognosis; treatment planning; and allocation of resources. While algorithms have long been derived from traditional statistical techniques, such as regression analysis, their use in predictive analyses is increasingly fueled by artificial intelligence techniques, including machine learning.
Healthcare algorithms and algorithm-informed healthcare decision tools commonly include clinical and sociodemographic variables and measures of healthcare utilization. Race and ethnicity are often used as input variables and influence clinical decision-making and patient outcomes.1-3 Because race and ethnicity are socially constructed, their inclusion as variables within healthcare algorithms may lead to unknown or unwanted effects, including the potential for exacerbation and/or perpetuation of health and healthcare disparities.4,5
For the purposes of this project, we define disparities as differences between racial/ethnic populations in measures of health and healthcare such as burden of disease, health outcomes, and quality of care, after taking into account factors such as clinical needs and patient preferences. Disparities that are driven by and contribute to broad imbalances in power, justice, social structures, or resources are considered inequities.6,7
A central rationale for including race/ethnicity in healthcare algorithms and decision tools has been that doing so could increase diagnostic or predictive accuracy by capturing racial/ethnic differences in genetic predispositions that affect clinical outcomes. However, race/ethnicity is a poor proxy for genetic predisposition, as there is typically greater genetic variation within groups classified as the same race or ethnicity than between them.8-10 Numerous purported racial/ethnic genetic predisposing differences regarding muscle mass, pain sensitivity, lung function, and similar biomarkers have been debunked. Mounting research seeks to detail non-biological root causes of observed differences in health, including structural racism, chronic discrimination more generally, and social determinants of health (SDOH).11-14 Furthermore, racial/ethnic categories lack specificity and sensitivity even when self-identified. Similarly, exclusive categories are inaccurate for multi-racial and multi-ethnic individuals. Additionally, standard definitions of these categories, such as the taxonomy developed by the Office of Management and Budget, are not uniformly used.15,16
Other variables used in algorithms and decision tools may also contribute to health disparities and exacerbate inequities. For example, an algorithm used to allocate access to disease management support programs was found to include prior use of healthcare services as a surrogate for disease severity. This led to stark racial disparities in program use because previous use of resources (i.e., healthcare utilization) itself was indicative of barriers to care and existing healthcare inequity, thus not accurately representative of need for services.5
Developers of healthcare algorithms and algorithm-informed decision tools often justify the inclusion of racial/ethnic variables by citing observational studies or post-hoc analyses of trial data that demonstrate differences in characteristics or outcomes among different racial/ethnic groups. These studies may be small and unrepresentative, serve to reinforce misconceptions, or assign race/ethnicity as a contributing cause when other factors may be causative, confounding, or modifying the effects of race.17,18 A robust example in the published literature examines a “race-correction” coefficient used to estimate glomerular filtration rate (eGFR) for Black patients, a key indicator in diagnosing and treating kidney disease. Recent studies have modeled the effect of removing the race-based coefficient19-21 and concluded that Black patients may be more likely to receive needed kidney transplants without the use of race-correction. However, controversy around this issue remains,22-24 as the evidence base lacks prospective trials comparing differing approaches to assessing kidney disease and subsequent need for treatments including transplant. Accordingly, a task force was convened by the National Kidney Foundation and the American Society of Nephrology to address this topic. In September 2021, the task force released its final report recommending discontinuation of the race variable25 in calculating eGFR.
Evidence gaps similar to that which preceded the creation of the task force on eGFR are likely for other healthcare algorithms and algorithm-informed decision tools that include race/ethnicity, with few studies comparing the effects of alternative strategies. Moreover, little is currently known about how healthcare algorithms and algorithm-informed decision tools that do not explicitly include variables based on race/ethnicity may nevertheless exacerbate or perpetuate racial/ethnic health and healthcare disparities.
Purpose of the Systematic Review
This report will examine the mechanisms by which healthcare algorithms and algorithm-informed decision tools exacerbate or perpetuate racial or ethnic health and healthcare disparities in access to care, quality of care, and health outcomes. Specific goals include:
- Describe the ecosystem of healthcare algorithms that incorporate race/ethnicity and their dissemination and uptake into healthcare decision tools for screening, risk prediction, diagnosis, prognosis, treatment, and resource allocation.
- Characterize the effects of healthcare algorithms and algorithm-informed decision tools that exacerbate or perpetuate racial/ethnic disparities in healthcare and health outcomes. This includes algorithms and decision tools that explicitly use race/ethnicity, as well as those that use other variables that lead to racial/ethnic disparities.
- Identify healthcare algorithms and algorithm-informed decision tools incorporating race/ethnicity or other variables that may impact racial/ethnic disparities currently in development or in use, but have not yet been studied sufficiently to assess their effects on racial/ethnic health and healthcare disparities.
- Examine strategies to mitigate racial/ethnic bias in the development and use of algorithms and algorithm-informed decision tools, including but not exclusive to: elimination of variables based on race/ethnicity; use of non-racial/ethnic variables to address effects of structural racism and SDOH; increasing the representativeness of data sets used to develop algorithms; and approaches used during validation and/or implementation of algorithms to identify and address racial/ethnic bias.
- Explore contextual concerns including: the roles of healthcare algorithm developers and end-users; available or emerging guidance on preventing racial/ethnic bias during development of algorithms; stakeholder awareness of and perspectives on potentially biased algorithms and decision tools; and incentives and barriers that affect how stakeholders use, evaluate, and de-implement algorithm-informed decision tools.
II. Key Questions
Two key questions (KQs) specify the scope of the systematic review portion.
KQ 1. What is the effect of healthcare algorithms and algorithm-informed decision tools on racial/ethnic differences in access to care, quality of care, and health outcomes?
KQ 1 characterizes evidence of healthcare algorithms or tools that may affect racial/ethnic disparities in health and healthcare. An algorithm might create, perpetuate, or exacerbate differences in outcomes; or mitigate a preexisting health or healthcare disparity; or may have no effect. We will classify observed differences as disparities (differences that exist after accounting for clinical needs and patient preferences), or inequities (driven by and contributing to broad imbalances in power, justice, social structures, or resources), as appropriate. Studies of algorithms or tools that did not look for an effect on racial/ethnic differences will be excluded.
KQ 2. What is the effect of interventions, models of interventions, or other approaches to mitigate racial/ethnic bias in the development, validation, dissemination, and implementation of healthcare algorithms and algorithm-informed decision tools?
- Datasets: What is the effect of interventions, models of interventions, or approaches to mitigate racial/ethnic bias in datasets used for development and validation of algorithms?
- Algorithms/Tools: What is the effect of interventions, models of interventions, or approaches to mitigate racial/ethnic bias produced by algorithms/tools or their dissemination and implementation?
KQ 2 focuses on strategies to mitigate racial/ethnic bias produced by healthcare algorithms or algorithm-informed decision tools. The focal point for mitigation could be datasets used to develop or train algorithms, components and constructs included in algorithms, or processes of developing, validating, implementing, disseminating, or adapting algorithms. KQ 2 will include algorithms that were redesigned to mitigate bias after they were previously associated with contributing to a disparity. We will identify and describe strategies to address potential bias, and review any evidence of the effect on racial/ethnic disparities. To identify these resources, we will search published and grey literature, examine responses to the public Request for Information (RFI),26 and query the Technical Expert Panel (TEP) and our subject matter experts (SMEs).
Table 1 presents criteria that will guide study inclusion and assessment of outcomes, organized according to the PICOTS (Population, Interventions, Comparator, Outcomes, Timing, and Setting) framework.
Patients whose healthcare could be affected by algorithms and algorithm-informed decision tools (e.g., clinical guidelines, pathways, clinical decision support programs in EHRs, operational systems used by health systems and payers).
KQ 1: Algorithms and algorithm-informed decision tools that have been, or are currently being used for screening, risk prediction, diagnosis, prognosis, treatment, or resource allocation. They do not have to explicitly use race/ethnicity variables as inputs.
KQ 1: Appropriate comparators include:
Outcomes must be reported by race or ethnicity.
Quality of care
No minimum follow-up
Studies conducted in populations outside the United States will be excluded for KQ1.
In addition to the KQs listed above, this review will also address four Contextual Questions (CQs):
CQ 1: How widespread is the inclusion of variables based on race/ethnicity in healthcare algorithms and algorithm-informed decision tools?
- What types of algorithms and algorithm-informed decision tools used in healthcare include variables based on race/ethnicity? How widely are they used?
- Who develops algorithms and algorithm-informed decision tools used in healthcare that might include variables based on race/ethnicity?
- Who are the end-users of these algorithms and algorithm-informed decision tools used in healthcare? What incentives and barriers are there to implementation or de-implementation?
- What patient populations are included?
- What clinical conditions, processes of care, and healthcare settings are included?
CQ 2: What are existing and emerging national or international standards or guidance for how algorithms and algorithm-informed decision tools should be developed, validated, implemented, and updated to avoid introducing bias that could lead to health and healthcare disparities?
- Within these standards or guidance, what are the recommendations about the use of variables or datasets that include race/ethnicity to develop or validate algorithms?
- What are the recommendations about variables used or sought in place of race/ethnicity (e.g., genetic markers and biomarkers, SDOH, the experience of individual and structural racism), including standards or guidance for how to define and collect data on these variables, and their impact on exacerbating or mitigating bias?
- What are the recommendations for identifying and addressing other types of variables that could introduce bias leading to disparities, such as measures of healthcare use or SDOH?
- What are the recommendations regarding transparency or disclosure of information related to algorithm development, validation, use, and outcomes?
CQ 3: To what extent are patients, providers (e.g., clinicians, hospitals, health systems), payers (e.g., insurers, employers), and policymakers (e.g., healthcare and insurance regulators, state Medicaid directors) aware of the inclusion of variables based on race/ethnicity in healthcare algorithms and algorithm-informed decision tools?
- Is there evidence of how these types of algorithms and tools might contribute to biases in provider and payer perceptions of affected populations and their clinical care?
CQ 4: Select a sample of approximately 5-10 healthcare algorithms and algorithm-informed decision tools that have the potential to impact racial/ethnic disparities in access to care, quality of care, or health outcomes and are not included in KQs 1 or 2. For each tool, describe the type of tool, its purpose (e.g., screening, risk prediction, diagnosis, etc.), its developer and intended end-users, affected patient population, clinical condition or process of care, healthcare setting, and information on outcomes, if available. The intent of this question is to consider the use of healthcare algorithms and algorithm-informed decision tools that may be perpetuating racial/ethnic bias but have not been previously linked to disparities in health or healthcare.
- If race/ethnicity is included as a variable, how is it defined? Are definitions consistent with available standards, guidance, or important considerations identified in CQ 2?
- For healthcare algorithms and algorithm-informed decision tools that include other variables in place of or associated with race/ethnicity, how were these other variables defined? Are these definitions consistent with available standards, guidance, or important considerations as identified in CQ 2? Were racial/ethnic variables considered during initial development or validation?
- For each healthcare algorithm and algorithm-informed decision tool, what methods were used for development and validation? What evidence, evidence quality, data sources, and study populations were used for development and validation?
- Are development and validation methods consistent with available standards, guidance, and strategies to mitigate bias and reduce the potential of healthcare algorithms or algorithm-informed decision tools to contribute to health disparities?
- What approaches and practices are there to implement, adapt, or update each healthcare algorithm or algorithm-informed decision tool?
We will address CQ 1-3 in the review’s Discussion section, referring to evidence discovered during the review process. The framework and methods for CQ 4 are discussed in the following sections.
III. Analytic Frameworks
The KQs will be addressed by a systematic review of published studies and grey literature. Figure 1 presents a draft analytic framework that displays the interaction between the major components of the evidence base, organized according to the PICOTS model.
We developed a separate analytic framework for CQ 4, presented in Figure 2. The algorithm/decision tool development-to-clinical implementation lifecycle involves multiple steps, each of which has the potential for the introduction of bias. The conceptual model in Figure 2 will guide our analysis and help describe and summarize the mechanisms through which bias can be introduced and result in disparities in access, quality, and health outcomes. The framework is informed by the Sociotechnical Model for Studying Health Information Technology in Complex Adaptive Healthcare Systems27 and the conceptual model for biases in healthcare proposed by Rajkomar.28
Biases can be introduced at any step in the algorithm development-to-implementation process. Figure 2 organizes this process into two major steps: algorithm development (Figure 2a) and algorithm translation, dissemination, and implementation (2b). Table 2 details potential biases that can be introduced during the algorithm development phase.
Data Selection and Management
Model Training / Development
Validation / Evaluation
Biases can be introduced de-novo during dissemination and implementation, or carried over from the development phase. Dissemination focuses on the spreading of knowledge and evidence by passively informing audiences, whereas implementation is a more active initiative that focuses on integrating and incorporating guidance into clinical workflow, often with technological support. We outline three opportunities in which bias can be newly introduced during this phase (red boxes, Figure 2(b)): translation, which is the process of operationalizing algorithms into decision tools or clinical processes such as clinical practice guidelines, pathways, and payer or health system protocols, and can result in bias through overgeneralization or extrapolation of guidance to populations in which the tool was not validated or tested; interaction, which can result in bias when a clinician is presented with guidance in the course of care but chooses not to act; and implicit/explicit bias, which might occur when a clinician makes a determination on behalf of a mixed-race patient regarding which race-category to document in an EHR. Use of consumer-facing health information technology (HIT) may result in additional biases, such as design and language choices that do not account for differences in healthcare literacy, numeracy, and language. Furthermore, bias can result when an algorithm or decision tool is not updated as the evidence base evolves or changes (not shown in Figure 2).
The method by which healthcare algorithms and algorithm-informed decision tools are disseminated and implemented provides additional opportunities for introduction of bias. We have organized dissemination and implementation methods into tiers, each based on the potential impact on outcomes. Standard Dissemination is defined as a non-HIT-supported method for providing guidance to clinicians. Standard dissemination requires a clinician to be aware of the existence of guidance, understand the guidance and patient applicability, and understand how to integrate guidance into care. Systems-Level Dissemination is defined as the use of HIT to reach clinicians, such as through a cloud-based clinical pathways library, and has a potentially larger impact on outcomes than standard dissemination, as the use of HIT may increase the number of clinicians who use the algorithm or decision tool. Systems-Level Implementation is defined as the translation and integration of algorithms and decision tools into clinical workflow, to display guidance at the right time, through the right system, to the right person, and in the right format to have the greatest likelihood to impact patient care and outcomes.
Biases introduced during algorithm development may also be amplified, such as when a decision tool is incorporated in an EHR, or added to, such as when clinicians interpret guidance through implicit or explicit biases. The magnitude and impact of biases depends on the dissemination and implementation method (Figure 2(b), light blue arrows) as well as the interaction between the clinician user, dissemination and implementation method, and patient (dark blue arrows).
To inform CQ 4, we will identify 5-10 healthcare algorithms or algorithm-informed decision tools that are not evaluated in the studies included in KQs 1 or 2, to examine their potential impact on health and healthcare disparities. In selecting the tools, we will consider a variety of patient populations, clinical conditions, types of decision tools, settings, and end-users. We may also prioritize tools, in part, by considering disease prevalence and burden, and conditions for which racial/ethnic disparities in healthcare and/or health outcomes are well-documented.
In order to address the potential effects of these algorithms or decision tools, we will examine the health outcomes and process outcomes delineated in Table 1. Additionally, we will describe development and validation methods, and report data when available on sensitivity, specificity, and similar measures. We will also document whether algorithm and decision tool developers explicitly considered potential bias (e.g., by examining algorithm performance by race/ethnicity), or used any strategies that might help to mitigate bias. Finally, we will describe key components of dissemination and implementation strategies used by developers and end-users, and consider the effects of these dynamics on disparities.
IV. Methods for Key Questions
Criteria for Study Inclusion and Exclusion
As suggested in the Agency for Healthcare Research and Quality (AHRQ) EPC Methods Guide for Comparative Effectiveness Reviews, we list the inclusion criteria in several categories: publication type, study design, intervention characteristics, setting, and outcome data.
- We will not include abstracts or meeting presentations because they do not include sufficient details about experimental methods to permit an evaluation of study design and conduct; they may also contain only a subset of measured outcomes.29,30 Additionally, it is not uncommon for abstracts that are published as part of conference proceedings to have inconsistencies when compared with the final study publication or to describe studies that are never published as full articles.31-34
- We will include studies published from 2011 to the present. Earlier articles are unlikely to reflect current algorithms.
- To avoid double-counting patients, when several reports of overlapping patients are available, we will only include outcome data from the report with the largest number of patients. We will include data from a smaller publication when it reports data on different racial/ethnic group(s), or an included outcome that was not provided by the largest report, or if it reports longer follow-up data for an outcome.
- The timeframe for this review does not permit translation of non-English language articles.
Study Design Criteria
- We will only include empirical studies; thus, we will exclude reviews, letters, guidelines, position statements, and commentaries. We will use systematic reviews only to identify empirical studies, as a supplement to the full literature search (described below in the Literature Search Strategy).
- We will consider any study design with a relevant comparison as described in Table 1.
- We will include studies with prospective or retrospective data analysis, or that modeled potential outcomes.
- To be considered an “algorithm”, a mathematical formula or model must combine different variables or factors to produce a numerical score or a scaled ranking, or populate a classification scheme that can be used to guide healthcare decisions. To be considered an “algorithm-based decision support tool”, a clinical guideline, pathway, clinical decision support intervention in an EHR, or an operational system used by health systems and payers must be informed by an algorithm as defined above.
- For KQ1, the algorithm must have been applied to a patient/participant population other than the derivation population. We will exclude newly developed algorithms that have been evaluated only in a derivation population.
- Any study conducted in a clinical or non-clinical site as described in Table 1.
- For KQ 1, a study must have evaluated whether an algorithm has an effect on a racial/ethnic difference. We do not require that reported effect sizes be statistically significant, or that a study controls for possible confounders (confounding will be addressed in our narrative appraisal of the evidence).
- For both KQs, a study must have reported data in one of three outcome categories (access to care, quality of care, and health outcomes).
Literature Search Strategies for Identification of Relevant Studies to Answer the Key Questions
Literature searches will be performed by Medical Librarians at the EPC Information Center, and will follow established systematic review protocols. We will search the following databases using controlled vocabulary and text words: EMBASE and Medline (via EMBASE.com), PubMed (in process citations, to capture items not yet indexed in Medline), and The Cochrane Library. The search strategy will include controlled vocabulary terms (e.g., MeSH or Emtree), along with free-text words, related to race, ethnicity, algorithms, disparities, and inequities. These searches will utilize a hedge to remove conference abstracts, editorials, letters, and news items; however, some of these items may be retained in the final search to help inform the contextual questions. The searches will be independently peer reviewed by a librarian using the PRESS Checklist. The proposed search strategy for EMBASE and Medline (via EMBASE.com) is included in Appendix A. We will also review submissions to AHRQ’s Supplemental Evidence and Data (SEADs) portal to identify studies meeting inclusion criteria.
We will conduct a grey literature search of the following resources: Association for Computing Machinery Digital Archives, medRxiv and bioRxiv Preprint Servers, ClinicalTrials.gov, and the web sites of relevant organizations (e.g., AHRQ, American Actuarial Association, American Hospital Association Institute for Diversity and Health Equity, American Medical Informatics Association, Centers for Disease Control and Prevention, Consumer Financial Protection Bureau, Healthcare Information and Management Systems Society, Food and Drug Administration, Health Resources and Services Administration, National Institute of Standards and Technology, Office of the National Coordinator for Health Information Technology, Observational Health Data Sciences and Informatics, and others as recommended by the SMEs and TEP). Hand searches of published systematic reviews will be used to identify any studies missed by searches and Scopus may also be used to identify related publications through citation tracking.
Literature screening will be performed using the database Distiller SR (Evidence Partners, Ottawa, Ontario, Canada). Literature search results will initially be screened for relevance. Relevant abstracts will be screened against the inclusion and exclusion criteria in duplicate. Studies that appear to meet the inclusion criteria will be retrieved in full and screened again in duplicate against the inclusion and exclusion criteria. All disagreements will be resolved by consensus discussion between the two original screeners. The literature searches will be updated during the Public Comment process, before finalization of the review.
Data Abstraction and Data Management
Data will be abstracted using Microsoft Word and Excel. Elements to be abstracted include: general study characteristics (e.g., study design, setting, enrolled number of patients, length of follow-up); patient characteristics (e.g., age, sex, race/ethnicity, clinical condition); intervention details (e.g., type of algorithm/tool, intent of algorithm/tool, input variables used, datasets used for development and validation); developer (e.g., vendor or institution); intended user (e.g., physician, nurse, administration, population health program); and outcome data.
Assessment of Methodological Risk of Bias and Data Synthesis
Traditional tools that assess methodological risk of bias of individual studies may be limited in their ability to assess methodological bias related to racial/ethnic equity. Therefore, we will identify specific criteria to assess risk of bias using commonly used risk of bias tools after examining the specific types of studies that will be included. Criteria will be derived from applicable items in existing assessment tools and discussion with SMEs and the TEP. We may also examine or pilot the use of emerging supplements for equity-based appraisal.
We will complete a synthesis of the evidence that considers and addresses study designs, characteristics of the evidence, and themes that are relevant to stakeholders. For KQ 1, we will synthesize the evidence with a focus on three potential results of algorithms: exacerbation or introduction of differences in outcomes; mitigation of existing disparities; or no discernible effect related to race/ethnicity. Characteristics of algorithms and features of their development, validation, implementation, and dissemination will be analyzed to identify associations between these factors and potential bias. For KQ 2, our synthesis will focus on the varying types of mitigation strategies that are identified. We will analyze the extent of current research on different approaches, examine and classify their key features, and review evidence of their effectiveness when available. Concordance tables may be developed to summarize the interventions and approaches identified for mitigation of bias.
V. Methods for Contextual Questions
CQs 1 - 3
In addition to the literature searches conducted to address the KQs, we will conduct supplemental searches, if necessary, to identify studies, standards, frameworks, white papers, and other relevant resources that address CQs 1, 2, and 3. We will also draw on responses to the RFI26 and discussions with the SMEs, TEP, and Key Informants (KIs) to inform our analysis of these CQs.
To address CQ 4, we will identify and select sample algorithms, abstract relevant data, and appraise key features of each algorithm. These processes are described below.
Algorithm sample identification and selection.
We will employ four distinct approaches for identifying algorithm samples. Figure 3 depicts the flow and organization for these activities. First, we will identify conditions with the highest disease burden and/or extreme racial/ethnic disparities in outcomes, by examining available sources such as CDC mortality and morbidity reports and AHRQ’s National Healthcare Quality and Disparities Reports.35,36 We will then review the findings of the searches for the KQs, and perform supplemental searches as needed to identify algorithms and studies relevant to these conditions. We will also conduct searches in the grey literature and examine websites from major healthcare systems (e.g., US News Honor Roll, Association of American Medical Colleges Council of Teaching Hospitals, non-academic healthcare systems) to identify algorithms potentially in use but not yet published in peer-reviewed sources. Second, we will review our discussions with KIs, SMEs, and the TEP related to specific examples of algorithms that they would recommend for inclusion. Selected experts may be contacted for follow-up when needed. Third, we will review the responses to the RFI26 and the public posting of the KQs. Lastly, we will query select vendors with whom we have established relationships or connections to identify critical or high-use algorithms.
Results from each of the algorithm selection approaches will be collated and duplicates removed. We will construct a database of algorithms from this pool and will add key data, such as: type of algorithm, intent of algorithm, developer/vendor, intended user, patient population, clinical condition, setting, and anticipated evidence base (e.g., citations.) We will use an iterative, consensus-driven approach to select the final 5-10 examples. Finally, we will identify relevant and representative study exemplars, by study type (e.g., development, validation, implementation, comparative effectiveness) for each algorithm or decision tool in the sample.
For each algorithm we will abstract technical specifications such as input variables used, datasets used for development and validation, and types of outcomes produced. We will also include, when available, details about processes used for development and validation, and outcome data. Finally, we will document, when possible, information about the extent of use in clinical practice; dissemination and implementation activities (e.g., incorporated in a guideline or EHR); and years in use or since publication. Additional variables may be included depending on findings. These data will be combined with the information described above that will be collected during algorithm selection.
Each sample algorithm will be evaluated qualitatively or quantitatively, as feasible, to determine the likelihood of contributing to racial/ethnic disparities in outcomes. We will review existing evaluation tools, identify emerging standards, and work with our SMEs, TEP, KIs, and other stakeholders to identify gaps and deficiencies related to assessing racial/ethnic bias in algorithms. Appraisal addendums may be developed as a result of these gaps.
Descriptive data for each algorithm will be summarized using evidence tables and other traditional approaches; we may develop concordance tables and additional visualization tools as needed to aid in communication of results.
- Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight - reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020 Aug 27;383(9):874-82. doi.org/10.1056/NEJMms2004740. PMID: 32853499.
- Cerdeña JP, Plaisime MV, Tsai J. From race-based to race-conscious medicine: how anti-racist uprisings call us to act. Lancet. 2020 Oct 10;396(10257):1125-8. doi.org/10.1016/S0140-6736(20)32076-6. PMID: 33038972.
- Schmidt IM, Waikar SS. Separate and unequal: race-based algorithms and implications for nephrology. J Am Soc Nephrol. 2021 Mar;32(3):529-33. doi.org/10.1681/ASN.2020081175. PMID: 33510038.
- Eneanya ND, Yang W, Reese PP. Reconsidering the consequences of using race to estimate kidney function. JAMA. 2019 Jul 9;322(2):113-4. doi.org/10.1001/jama.2019.5774. PMID: 31169890.
- Obermeyer Z, Powers B, Vogeli C, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 Oct 25;366(6464):447-53. doi.org/10.1126/science.aax2342. PMID: 31649194.
- Braveman P. Health disparities and health equity: concepts and measurement. 2006. 27:167-94.doi.org/10.1146/annurev.publhealth.27.021405.102103. PMID: 16533114.
- Committee on Understanding and Eliminating Racial and Ethnic Disparities in Health Care. Smedley BD, Stith AY, Nelson AR, editor(s). Unequal treatment: confronting racial and ethnic disparities in health care. Washington (DC): National Academy of Sciences; 2003. 781 p.
- Hu AC, Chapman LW, Mesinkovska NA. The efficacy and use of finasteride in women: a systematic review. Int J Dermatol. 2019 Jul;58(7):759-76. doi.org/10.1111/ijd.14370. PMID: 30604525.
- Roberts D. Fatal invention: how science, politics, and big business re-create race in the twenty-first century. New York (NY): The New Press; 2011. 400 p.
- Yudell M, Roberts D, DeSalle R, et al. Science and society. Taking race out of human genetics. Science. 2016 Feb 5;351(6273):564-5. doi.org/10.1126/science.aac4951. PMID: 26912690.
- Smedley BD. The lived experience of race and its health consequences. Am J Public Health. 2012 May;102(5):933-5. doi.org/10.2105/AJPH.2011.300643. PMID: 22420805.
- Williams DR, Lawrence JA, Davis BA. Racism and health: evidence and needed research. Annu Rev Public Health. 2019 Apr 1;40:105-25. doi.org/10.1146/annurev-publhealth-040218-043750. PMID: 30601726.
- Bailey ZD, Krieger N, Agénor M, et al. Structural racism and health inequities in the USA: evidence and interventions. Lancet. 2017 Apr 8;389(10077):1453-63. doi.org/10.1016/S0140-6736(17)30569-X. PMID: 28402827.
- Paradies Y, Ben J, Denson N, et al. Racism as a determinant of health: a systematic review and meta-analysis. PLoS ONE. 2015;10(9):e0138511. doi.org/10.1371/journal.pone.0138511. PMID: 26398658.
- Race, ethnicity, and language data: standardization for health care quality improvement. AHRQ publication no. 10-0058-EF. [internet]. Rockville (MD): Agency for Healthcare Research and Quality; 2010 Mar [updated 2018 Apr 01]; [accessed 2011 May 16].
- Klinger EV, Carlini SV, Gonzalez I, et al. Accuracy of race, ethnicity, and language preference in an electronic health record. J Gen Intern Med. 2015 Jun;30(6):719-23. doi.org/10.1007/s11606-014-3102-8. PMID: 25527336.
- Kowalsky RH, Rondini AC, Platt SL. The case for removing race from the American Academy of Pediatrics clinical practice guideline for urinary tract infection in infants and young children with fever. JAMA Pediatr. 2020 Mar 1;174(3):229-30. doi.org/10.1001/jamapediatrics.2019.5242. PMID: 31930353.
- Vyas DA, Jones DS, Meadows AR, et al. Challenging the use of race in the vaginal birth after cesarean section calculator. Womens Health Issues. 2019 May-Jun;29(3):201-4. doi.org/10.1016/j.whi.2019.04.007. PMID: 31072754.
- Ahmed S, Nutt CT, Eneanya ND, et al. Examining the potential impact of race multiplier utilization in estimated glomerular filtration rate calculation on African-American care outcomes. J Gen Intern Med. 2021 Feb;36(2):464-71. doi.org/10.1007/s11606-020-06280-5. PMID: 33063202.
- Diao JA, Wu GJ, Taylor HA, et al. Clinical implications of removing race from estimates of kidney function. JAMA. 2021 Jan 12;325(2):184-6. doi.org/10.1001/jama.2020.22124. PMID: 33263721.
- Inker LA, Couture SJ, Tighiouart H, et al. A new panel-estimated GFR, including ß2-microglobulin and ß-trace protein and not including race, developed in a diverse population. Am J Kidney Dis. 2020 Dec 7;S0272-6386(20):31126-4. doi.org/10.1053/j.ajkd.2020.11.005. PMID: 33301877.
- Norris KC, Eneanya ND, Boulware LE. Removal of race from estimates of kidney function: first, do no harm. JAMA. 2021 Jan 12;325(2):135-7. doi.org/10.1001/jama.2020.23373. PMID: 33263722.
- Levey AS, Titan SM, Powe NR, et al. Kidney disease, race, and GFR estimation. Clin J Am Soc Nephrol. 2020 Aug 7;15(8):1203-12. doi.org/10.2215/CJN.12791019. PMID: 32393465.
- Powe NR. Black kidney function matters: use or misuse of race? JAMA. 2020 Aug 25;324(8):737-8. doi.org/10.1001/jama.2020.13378. PMID: 32761164.
- Delgado C, Baweja M, Crews D, et al. A unifying approach for GFR estimation: recommendations of the NKF-ASN task force on reassessing the inclusion of race in diagnosing kidney disease. J Am Soc Nephrol. 2021 Sep 23;ASN.2021070988. Online ahead of print. doi.org/10.1681/ASN.2021070988. PMID: 34556489.
- AHRQ seeking input to inform new systematic review, use of clinical algorithms that have the potential to introduce racial/ethnic bias into healthcare delivery. [internet]. Rockville (MD): Effective Health Care Program, Agency for Healthcare Research and Quality; 2021 Feb [updated 2021 Apr 01]; [accessed 2022 Jan 03]. [1 p].
- Sittig DF, Singh H. A new socio-technical model for studying health information technology in complex adaptive healthcare systems. In: Patel V, Kannampallil T, Kaufman D, editor(s). Cognitive Informatics for Biomedicine. Health Informatics. Springer International Publishing; 2015. p. 59-80. doi.org/10.1007/978-3-319-17272-9_4.
- Rajkomar A, Hardt M, Howell MD, et al. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018 Dec 18;169(12):866-72. doi.org/10.7326/M18-1990. PMID: 30508424.
- Chalmers I, Adams M, Dickersin K, et al. A cohort study of summary reports of controlled trials. JAMA. 1990 Mar 9;263(10):1401-5. PMID: 2304219.
- Neinstein LS. A review of Society for Adolescent Medicine abstracts and Journal of Adolescent Health Care articles. J Adolesc Health Care. 1987 Mar;8(2):198-203. doi.org/10.1016/0197-0070(87)90265-8. PMID: 3818406.
- Dundar Y, Dodd S, Williamson P, et al. Case study of the comparison of data from conference abstracts and full-text articles in health technology assessment of rapidly evolving technologies: does it make a difference? Int J Technol Assess Health Care. 2006 Jul;22(3):288-94. doi.org/10.1017/s0266462306051166. PMID: 16984055.
- De Bellefeuille C, Morrison CA, Tannock IF. The fate of abstracts submitted to a cancer meeting: factors which influence presentation and subsequent publication. Ann Oncol. 1992 Mar;3(3):187-91. doi.org/10.1093/oxfordjournals.annonc.a058147. PMID: 1586615.
- Scherer RW, Meerpohl JJ, Pfeifer N, et al. Full publication of results initially presented in abstracts. In: Cochrane Library [Cochrane methodology review]. Issue 11. John Wiley & Sons, Inc.; 2018 Nov 20 [accessed 2020 Aug 14]. doi.org/10.1002/14651858.MR000005.pub4. PMID: 30480762.
- Yentis SM, Campbell FA, Lerman J. Publication of abstracts presented at anaesthesia meetings. Can J Anaesth. 1993 Jul;40(7):632-4. doi.org/10.1007/bf03009700. PMID: 8403137.
- National Healthcare Quality and Disparities Reports. [internet]. Rockville (MD): Agency for Healthcare Research and Quality; 2013 May [updated 2021 Dec 01]; [accessed 2022 Jan 15]. [2 p].
- 2021 National Healthcare Quality and Disparities Report [AHRQ Publication 21(22)-0054-EF]. Rockville (MD): Agency for Healthcare Research and Quality (AHRQ); 2021 Dec. 316 p.
VI. Summary of Protocol Amendments
If we need to amend this protocol, we will give the date of each amendment, describe the change, and give the rationale in this section. Changes will not be incorporated into the protocol. Example table below:
VII. Review of Key Questions
AHRQ posted the KQs on the AHRQ Effective Health Care Website for public comment in November 2021. The EPC finalized the KQs after reviewing the public comments and considering input from KIs and the TEP. This input is intended to ensure that the KQs are clear, specific and relevant.
Three organizations and two individuals offered public comments. The organizations that offered feedback were the American Academy of Family Medicine, National Patient Advocate Foundation, and the AI Healthcare Foundation. The comments were uniformly supportive of the goals of the project and commended AHRQ for undertaking this effort. One theme that emerged from the comments was the need to explicitly consider transparency of algorithms when examining standards or guidance for mitigating bias. We have added subquestion d to CQ 2 to address the role of transparency and disclosure when evaluating algorithms.
The National Patient Advocate Foundation advised on the importance of considering costs and burdens to patients that are associated with algorithms and algorithm-informed decision tools. We include direct costs to patients as an outcome of interest, and will emphasize the importance of acknowledging and measuring non-clinical patient-centered outcomes such as quality of life when evaluating the impact of healthcare algorithms. The American Academy of Family Medicine suggested that our research should include greater emphasis on processes through which algorithms are broadly disseminated to end-users. We agree that dissemination is a critical consideration, and address it in KQ2 and 2b. Additionally, our approach to CQ 4 will explore dissemination strategies explicitly, as described above in Section 3 on the Analytic Frameworks. Finally, the AI Healthcare Foundation shared several useful resources and provided numerous citations of relevant publications that will be helpful to this review.
VIII. Key Informants
Key Informants are the end-users of research; they can include patients and caregivers, practicing clinicians, relevant professional and consumer organizations, purchasers of health care, and others with experience in making health care decisions. Within the EPC program, the Key Informant role is to provide input into decisional dilemmas and help focus the project scope on KQs that will inform health care decisions. The EPC solicits input from Key Informants when developing questions for the systematic review or when identifying high-priority research gaps and needed new research. Key Informants are not involved in analyzing the evidence or writing the report. They do not review the report, except as given the opportunity to do so through the peer or public review mechanism.
Key Informants must disclose any financial conflicts of interest greater than $5,000 and any other relevant business or professional conflicts of interest. Because of their role as end-users of the report rather than as participants in conducting the research and analysis, individuals are invited to serve as Key Informants and those who present with potential conflicts may be retained. The AHRQ Task Order Officer (TOO) and the EPC work to balance, manage, or mitigate any potential conflicts of interest identified.
IX. Technical Experts
Technical Experts constitute a multi-disciplinary group of clinical, content, and methodological experts who provide input in defining populations, interventions, comparisons, or outcomes and identify particular studies or databases to search. The Technical Expert Panel is selected to provide broad expertise and perspectives specific to the topic under development. Divergent and conflicting opinions are common and perceived as healthy scientific discourse that fosters a thoughtful, relevant systematic review. Therefore, study questions, design, and methodological approaches do not necessarily represent the views of individual technical and content experts. Technical Experts provide information to the EPC to identify literature search strategies and suggest approaches to specific issues as requested by the EPC. Technical Experts do not do analysis of any kind; neither do they contribute to the writing of the report.
Members of the TEP must disclose any financial conflicts of interest greater than $5,000 and any other relevant business or professional conflicts of interest. Because of their unique clinical or content expertise, individuals are invited to serve as Technical Experts and those who present with potential conflicts may be retained. The AHRQ TOO and the EPC work to balance, manage, or mitigate any potential conflicts of interest identified.
X. Peer Reviewers
Peer reviewers without prior knowledge of the contents of the report are invited to provide written comments on the draft report based on their clinical, content, or methodological expertise. The EPC considers all peer review comments on the draft report in preparing the final report. Peer reviewers do not participate in writing or editing of the final report or other products. The final report does not necessarily represent the views of individual reviewers.
The EPC will complete a disposition of all peer review comments. The disposition of comments for systematic reviews and technical briefs will be published 3 months after publication of the evidence report.
Potential peer reviewers must disclose any financial conflicts of interest greater than $5,000 and any other relevant business or professional conflicts of interest. Invited peer reviewers with any financial conflict of interest greater than $5,000 will be disqualified from peer review. Peer reviewers who disclose potential business or professional conflicts of interest can submit comments on draft reports through the public comment mechanism.
XI. EPC Team Disclosures
EPC core team members must disclose any financial conflicts of interest greater than $1,000 and any other relevant business or professional conflicts of interest. Direct financial conflicts of interest that cumulatively total more than $1,000 will usually disqualify an EPC core team investigator.
XII. Role of the Funder
This project was funded by Task Order No: 75Q80121F32005. The AHRQ Task Order Officer reviewed the EPC response to contract deliverables for adherence to contract requirements and quality. The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by either the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services.
This protocol will be registered in the international prospective register of systematic reviews (PROSPERO).
Embase.com Strategy: (Combines Medline and EMBASE) 1/1/2011 – 01/06/2022
* = truncation
/exp = explode to include all terms in the tree
/mj = limit to terms indexed as major concepts
/de = search term without exploding
:ti = search in the title field
:kw = search in the author keywords field
:ab = search in the abstract field
NEAR/# - search the terms within # of each other in any order
NEXT/# - search terms within # of each other in the specified order
1 'ancestry group'/exp OR 'ethnic group'/exp OR 'ethnic or racial aspects'/de OR 'ethnicity'/mj OR 'race'/de OR race:ti OR racial*:ti OR 'ethnic group*':ti OR ethnicit*:ti
2 'multiracial person'/exp OR 'asian american'/exp OR 'black person'/exp OR 'african american'/exp OR 'hispanic'/exp OR 'alaska native'/exp OR 'american indian'/exp OR 'pacific islander'/exp OR (((arab OR asian OR african OR indian* OR indigenous) NEXT/3 american*):ti,ab,kw) OR ((native NEAR/2 (American* OR Alaskan*)):ti,ab,kw) OR (((black OR brown) NEXT/2 (person* OR people OR patient* OR American*)):ti,ab,kw) OR black:ti,ab,kw OR hispanic*:ti,ab,kw OR latino*:ti,ab,kw OR latina*:ti,ab,kw OR latinx:ti,ab,kw OR (pacific NEXT/2 islander*):ti,ab,kw OR 'non caucasian*':ti,ab,kw OR noncaucasian*:ti,ab,kw OR 'non white*':ti,ab,kw OR nonwhite*:ti,ab,kw OR ((mexican* NEAR/5 (america* OR us OR usa)):ti,ab,kw) OR (mixed NEAR/2 (ethnic* OR race*)):ti,ab,kw OR Multiracial:ti,ab,kw OR Multi-racial:ti,ab,kw OR biracial:ti,ab,kw OR multiethnic*:ti,ab,kw OR multi-ethnic*:ti,ab,kw OR (multiple NEXT/1 (ethnic* OR race*)):ti,ab,kw OR bipoc:ti,ab,kw OR ((ethnic* OR race* OR racial) NEXT/1 group*):ti,ab,kw OR ((ethnic* OR race* OR racial) NEAR/2 ('sub group*' OR subgroup*)):ti,ab,kw
3 1 OR 2
4 'algorithm'/exp OR 'algorithm bias'/exp OR algorithm*:ti,ab,kw
5 'artificial intelligence'/exp OR 'computer model'/exp OR 'machine learning'/exp OR 'computer prediction'/exp OR 'data mining'/exp OR 'artificial neural network'/exp OR 'computer assisted diagnosis'/de OR 'computer analysis'/exp OR 'statistical model'/exp OR 'information processing'/mj OR ((artificial NEXT/2 intelligence):ti,ab,kw) OR (((computer OR machine OR deep) NEXT/2 (learning OR predict*)):ti,ab,kw) OR ((neural NEXT/2 network*):ti,ab,kw) OR ((data NEXT/2 (mine OR mined OR mining)):ti,ab,kw) OR ((dataset* OR 'data set*' OR model OR models) NEAR/5 (train OR training OR mitigat* OR bias*)):ti,ab,kw OR 'training data':ti,ab,kw
6 'calculation'/exp/mj OR 'rating scale'/exp/mj OR 'model'/mj OR 'disease model'/exp/mj OR 'scoring system'/exp/mj OR 'prediction and forecasting'/exp/mj OR scale:ti,kw OR scales:ti,kw OR instrument*:ti,kw OR index*:ti,kw OR indices:ti,kw OR measure*:ti,kw OR metric*:ti,kw OR calculat*:ti,kw OR score*:ti,kw OR formula:ti,kw OR formulas:ti,kw OR variable*:ti,kw OR coefficient*:ti,kw OR 'co-efficient*':ti,kw OR equation*:ti,kw OR proxy:ti,ab,kw OR proxies:ti,ab,kw OR tool*:ti,kw OR ((correction NEXT/2 factor*):ti,ab,kw) OR ((data NEXT/2 driven):ti,ab,kw) OR ((big NEXT/2 data):ti,ab,kw) OR ((predict* NEXT/2 (model* OR analytic*)):ti,ab,kw)
7 4 OR 5 OR 6
8 'bias'/de OR 'prejudice'/exp OR 'health disparity'/exp OR 'health care disparity'/exp OR 'disparity'/exp OR 'health equity'/exp OR 'race difference'/exp OR 'racism'/exp OR 'ethnic difference'/exp OR equity:ti,ab,kw OR disparit*:ti,ab,kw OR discrimination:ti,kw OR bias*:ti,ab,kw OR unequal*:ti,ab,kw OR inequal*:ti,ab,kw OR inequit*:ti,ab,kw OR disproportionat*:ti,ab,kw OR prejudice*:ti,ab,kw OR imbalance*:ti,ab,kw OR fairness:ti,ab,kw OR underserved:ti,ab,kw OR ((under NEXT/2 served):ti,ab,kw) OR marginalized:ti,ab,kw OR (((race* OR racial* OR ethnic* OR ancestries OR ancestry) NEAR/5 (differen* OR discrimination*)):ti,ab,kw) OR racism:ti,ab,kw OR racist:ti,ab,kw OR reclassif*:ti,ab,kw OR misestimat*:ti,ab,kw OR misrepresent*:ti,ab,kw OR "less likely":ti,ab OR "more likely":ti,ab OR ((with OR without) NEXT/3 (race OR ethnic* OR racial)):ti,ab OR (compared NEAR/6 (white OR whites OR Caucasian*)):ti,ab OR (underrepresent* OR overrepresent*):ti,ab
9 3 AND 7 AND 8
10 (('algorithm'/exp OR 'algorithm bias'/exp OR algorithm*:ti,kw) AND ('race'/de OR 'race difference'/exp OR 'racism'/exp OR race:ti,kw OR racial*:ti,kw OR ethnicity:ti,kw)) OR (Algorithm* NEAR/10 (race OR racial* OR ethnic* OR racis*))
11 9 OR 10
12 11 NOT ('book'/de OR 'case report'/de OR 'conference paper'/exp OR 'editorial'/de OR 'letter'/de OR (book OR chapter OR conference OR editorial OR letter):it OR [conference abstract]/lim OR [conference paper]/lim OR [conference review]/lim OR [editorial]/lim OR [letter]/lim OR (abstract OR annual OR conference OR congress OR meeting OR proceedings OR sessions OR symposium):nc OR ((book NOT series) OR 'conference proceeding'):pt OR ('case report' OR comment* OR editorial OR letter OR news):ti OR ((protocol AND (study OR trial)) NOT ('therapy protocol*' OR 'treatment protocol*')):ti)
13 12 NOT (([animals]/lim NOT [humans]/lim) OR ((animal OR animals OR canine* OR dog OR dogs OR feline OR hamster* OR lamb OR lambs OR mice OR monkey OR monkeys OR mouse OR murine OR pig OR piglet* OR pigs OR porcine OR primate* OR rabbit* OR horse OR horses OR rat OR rats OR rodent* OR sheep* OR swine OR veterinar*) NOT (human* OR patient*)):ti)
14 13 AND ('united states'/exp OR 'united states' OR usa OR american*)
15 14 AND ([english]/lim AND [2011-2022]/py)