Skip Navigation
Department of Health and Human Services www.hhs.gov
  • Home
  • Search for Research Summaries, Reviews, and Reports
 
 

Abstract - Final – Oct. 29, 2013

Testing the Predictive Validity of Strength of Evidence Grades

Background

Strength of evidence (SOE) and grades for conveying SOE assessments are ways to describe succinctly to stakeholders the findings and conclusions present in systematic reviews. The GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach has become an internationally used standard to grade the SOE of a body of evidence in systematic reviews. Critics of the GRADE approach argue that the method is too strict and that systematic reviewers tend toward low, very low (in the context of GRADE), or insufficient (in the context of the AHRQ EPC guidance for GRADE) grades to be “on the safe side,” particularly for outcomes that could be viewed as controversial.

To date, the predictive validity of the GRADE approach has not been tested. Predictive validity refers, in general terms, to the degree to which a score predicts an outcome on a criterion measure. In the case of this project, we are using predictive validity to refer to the degree to which the GRADE approach reliably predicts the stability of an estimate of effect. We define stability as “similarity in the magnitude of the summary effect as evidence accrues over time.”

Objectives

This methods project has three main objectives:

  1. To assess the predictive validity of each of the four SOE grades from the GRADE approach.
  2. To assess the predictive validity of the entire GRADE approach.
  3. To determine the likelihood that the first available study of a body of evidence provides a substantially different estimate than the pooled effect of the entire body of evidence.

Approach

We will test whether the SOE grades as a categorical variable reliably predicts the likelihood of an effect estimate to be stable over time. The project consists of two main phases:

Phase 1

We will assemble a pool of published meta-analyses that researchers had previously graded as high SOE. Using cumulative meta-analyses, we will grade the SOE of subsets of this evidence at different points in time of the past. We will use z-scores to compare the effect estimates of the subsets (which we term the “gradeable effect”) with the pooled effect estimate of the published meta-analysis (which we term the “true effect”) to assess the stability of the effect estimate over time. Using a large sample of meta-analyses, the outcome of interest is the proportion of stable results over time for each grade of SOE. We expect a sample size of 120 ratings to be sufficient to achieve at least 80% power to compare the stability of all four SOE grades. We will randomly allocate grading exercises to investigators. Each grading exercise will be done dually and independently.

Phase 2

The goal of this phase is to address the three objectives of this methods project. To address the first objective,we will use data gathered through Phase 1 to calculate the proportion of stable results for each grade of SOE. We will then test the correlation between the probabilities reported during a web-based survey of the Austrian Cochrane Branch (expected stability) and the proportion of stable summary effects from the sample of meta-analyses (observed stability). To address the second objective, we will combine the predictive validities across SOE grades to calculate the predictive validity for the entire tool. To address the third objective, we will calculate z-scores between the true effect and the effect of the first study.

Return to Top of Page