Skip to main content
Effective Health Care Program

Comprehensive Overview of Methods and Reporting of Meta-Analyses of Test Accuracy

Research Report

People using assistive technology may not be able to fully access information in this file. For additional assistance, please contact us.

Note: This is one of four related projects designed to document the current standards and methods used in the meta-analysis of diagnostic tests, validate newly proposed methods, develop new statistical methods to perform meta-analyses of diagnostic tests, and then to incorporate these insights into computer software that will be available to all EPCs and others conducting reviews of diagnostic tests. The related projects can be accessed on the right side of this page.

Structured Abstract


Medical tests play a critical role in disease screening, diagnosis, and prediction of future outcomes. Meta-analyses of diagnostic or predictive test accuracy are increasingly performed and the relevant methods are continuously evolving.


We identified systematic reviews including quantitative synthesis (meta-analysis) of test accuracy for diagnostic or predictive medical tests through MEDLINE searches (1966 to December 2009) and perusal of reference lists of eligible articles and relevant reviews. We extracted information on topics and test types covered, methods for literature synthesis and quality assessment, availability of data, and statistical analyses performed.


Our searches retrieved 1,225 potentially eligible reviews of which 760 (published from 1987 to 2009) were finally considered eligible for inclusion. Eligible reviews included a median of 18 primary studies and typically examined a single index test against a single reference standard. The number of publications increased per calendar year (P < 0.001). Most meta-analyses pertained to cardiovascular disease (21 percent) and oncology (25 percent); the most common test categories were imaging (44 percent) and biomarker tests (28 percent). Meta-analyses used multiple electronic databases (62 percent used at least one electronic database in addition to MEDLINE; P for trend over time < 0.001) to identify eligible studies. There was a striking increase in the proportion of systematic reviews that reported assessing verification bias (P for trend < 0.001), spectrum bias (P for trend = 0.007), blinding (P for trend < 0.001), prospective study design (P for trend < 0.001), or consecutive patient recruitment (P for trend < 0.001), over time. Improvements were associated with reporting of using quality-item checklists to guide assessment of methodological quality. In statistical analyses, sensitivity (in 77 percent), specificity (in 74 percent) and diagnostic/predictive odds ratios (in 34 percent) were the most commonly used metrics. Heterogeneity tests were used in 58 percent, and subgroup or regression analyses were used in 57 percent of meta-analyses. Random effects models were employed in 57 percent of the reviews and increasingly over time (P for trend < 0.001). Theoretically motivated methods that model sensitivity and specificity simultaneously, while accounting for between-study heterogeneity, were used in a minority of reviews (11 percent) but increasingly over time (P for trend < 0.001).


Meta-analyses of diagnostic or predictive tests are increasingly performed. Over time there have been substantial improvements in the literature review, quality assessment and statistical analysis methods employed. Much of the improvement in quality assessment is associated with the use of quality item checklists. Advanced statistical methods have been increasingly adopted over time but their use still remains limited.