Skip to main content
Effective Health Care Program
Home » Products » Testing the Predictive Validity of Strength of Evidence Grades » Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study

Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study

White Paper

This report is available in PDF (1.7 MB) only. People using assistive technology may not be able to fully access information in this file. For additional assistance, please contact us.

Structured Abstract

Background

We sought to determine the predictive validity of the U.S. Evidence-based Practice Center (EPC) approach to GRADE (Grading of Recommendations Assessment, Development and Evaluation) by examining how reliably it can predict the likelihood that treatment effects remain stable as new studies emerge.

Methods

Based on 37 Cochrane reports with outcomes graded as high strength of evidence (SOE), we prepared 160 documents using portions of these bodies of evidence in a chronological order. We randomly assigned these documents, which represented different levels of SOE, to professional systematic reviewers from seven academic centers in Austria, Canada, and the United States, who dually graded the SOE using guidance for the EPC program. For each of the 160 documents, we determined whether estimates remained stable as subsequent studies were added to the evidence base. For each grade of SOE, we compared the observed proportion of stable estimates with the expected proportion from an international survey. To determine the predictive validity, we used the Hosmer-Lemeshow test to assess calibration and the C (concordance) index to assess discrimination.

Results

Overall, the predictive validity of the EPC approach to GRADE for the stability of effect estimates was limited. Except for moderate SOE, the expected and observed proportions of stable effect estimates differed considerably. Estimates graded as high SOE were less likely to remain stable than expected by producers and users of systematic reviews. By contrast, estimates graded as low or insufficient SOE were substantially more likely to remain stable than expected. In this sample, the EPC approach to GRADE could not reliably predict the likelihood that individual bodies of evidence remain stable as new evidence becomes available. Depending on the definition used, C-indices ranged between 0.56 (95% CI, 0.47 to 0.66) and 0.58 (95% CI, 0.50 to 0.67) indicating a low discriminatory ability.

Conclusions

The limited predictive validity of the EPC approach to GRADE seems to reflect a mismatch between expected and observed changes in treatment effects as bodies of evidence advance from insufficient to high SOE. In addition, many low or insufficient grades appear to be too strict.