Powered by the Evidence-based Practice Centers

Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study

White Paper Sep 14, 2015

This report is available in PDF (1.7 MB) only. People using assistive technology may not be able to fully access information in this file. For additional assistance, please contact us.

Structured Abstract


We sought to determine the predictive validity of the U.S. Evidence-based Practice Center (EPC) approach to GRADE (Grading of Recommendations Assessment, Development and Evaluation) by examining how reliably it can predict the likelihood that treatment effects remain stable as new studies emerge.


Based on 37 Cochrane reports with outcomes graded as high strength of evidence (SOE), we prepared 160 documents using portions of these bodies of evidence in a chronological order. We randomly assigned these documents, which represented different levels of SOE, to professional systematic reviewers from seven academic centers in Austria, Canada, and the United States, who dually graded the SOE using guidance for the EPC program. For each of the 160 documents, we determined whether estimates remained stable as subsequent studies were added to the evidence base. For each grade of SOE, we compared the observed proportion of stable estimates with the expected proportion from an international survey. To determine the predictive validity, we used the Hosmer-Lemeshow test to assess calibration and the C (concordance) index to assess discrimination.


Overall, the predictive validity of the EPC approach to GRADE for the stability of effect estimates was limited. Except for moderate SOE, the expected and observed proportions of stable effect estimates differed considerably. Estimates graded as high SOE were less likely to remain stable than expected by producers and users of systematic reviews. By contrast, estimates graded as low or insufficient SOE were substantially more likely to remain stable than expected. In this sample, the EPC approach to GRADE could not reliably predict the likelihood that individual bodies of evidence remain stable as new evidence becomes available. Depending on the definition used, C-indices ranged between 0.56 (95% CI, 0.47 to 0.66) and 0.58 (95% CI, 0.50 to 0.67) indicating a low discriminatory ability.


The limited predictive validity of the EPC approach to GRADE seems to reflect a mismatch between expected and observed changes in treatment effects as bodies of evidence advance from insufficient to high SOE. In addition, many low or insufficient grades appear to be too strict.

Project Timeline

Testing the Predictive Validity of Strength of Evidence Grades

Oct 29, 2013
Topic Initiated
Oct 29, 2013
Mar 31, 2015
Sep 14, 2015
White Paper
Page last reviewed January 2019
Page originally created November 2017

Internet Citation: White Paper: Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study. Content last reviewed January 2019. Effective Health Care Program, Agency for Healthcare Research and Quality, Rockville, MD.

Select to copy citation