Strength of evidence (SOE) and grades for conveying SOE assessments are ways to describe succinctly to stakeholders the findings and conclusions present in systematic reviews. The GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach has become an internationally used standard to grade the SOE of a body of evidence in systematic reviews. Critics of the GRADE approach argue that the method is too strict and that systematic reviewers tend toward low, very low (in the context of GRADE), or insufficient (in the context of the AHRQ EPC guidance for GRADE) grades to be “on the safe side,” particularly for outcomes that could be viewed as controversial.
To date, the predictive validity of the GRADE approach had not been tested. Predictive validity refers, in general terms, to the degree to which a score predicts an outcome on a criterion measure. In the case of this project, we used predictive validity to refer to the degree to which the GRADE approach reliably predicts the stability of an estimate of effect. We defined stability as “similarity in the magnitude of the summary effect as evidence accrues over time.”
This methods project had three main objectives:
- To assess the predictive validity of each of the four SOE grades from the GRADE approach.
- To assess the predictive validity of the entire GRADE approach.
- To determine the likelihood that the first available study of a body of evidence provides a substantially different estimate than the pooled effect of the entire body of evidence.
We tested whether the SOE grades as a categorical variable reliably predicts the likelihood of an effect estimate to be stable over time. The project consisted of two main phases:
Phase 1: Comparison of Effects as Evidence Evolves From Single Trials to High-Quality Bodies of Evidence
We assembled a pool of published meta-analyses that researchers had previously graded as high SOE. Using cumulative meta-analyses, we graded the SOE of subsets of this evidence at different points in time of the past. We used z-scores to compare the effect estimates of the subsets (which we term the "gradeable effect") with the pooled effect estimate of the published meta-analysis (which we term the "true effect") to assess the stability of the effect estimate over time. Using a large sample of meta-analyses, the outcome of interest is the proportion of stable results over time for each grade of SOE. We expected a sample size of 120 ratings to be sufficient to achieve at least 80% power to compare the stability of all four SOE grades. We randomly allocated grading exercises to investigators. Each grading exercise was done dually and independently.
Phase 2: Assessing the Predictive Validity of Strength of Evidence Grades: A Meta-Epidemiological Study
The goal of this phase is to address the three objectives of this methods project. To address the first objective, we used data gathered through Phase 1 to calculate the proportion of stable results for each grade of SOE. We then tested the correlation between the probabilities reported during a web-based survey of the Austrian Cochrane Branch (expected stability) and the proportion of stable summary effects from the sample of meta-analyses (observed stability). To address the second objective, we combined the predictive validities across SOE grades to calculate the predictive validity for the entire tool. To address the third objective, we calculated z-scores between the true effect and the effect of the first study.
Gartlehner G, Dobrescu A, Evans TS, et al. The predictive validity of quality of evidence grades for the stability of effect estimates was low: a meta-epidemiological study. J Clin Epidemiol. 2015 Sep 3 [Epub ahead of print]. DOI: 10.1016/j.jclinepi.2015.08.018. PMID: 26342443.