Skip to main content
Effective Health Care Program

An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Research Report

Persons using assistive technology may not be able to fully access information in this file. For additional assistance, please contact us.

Notice: This is one of four related projects designed to document the current standards and methods used in the meta-analysis of diagnostic tests, validate newly proposed methods, develop new statistical methods to perform meta-analyses of diagnostic tests, and then to incorporate these insights into computer software that will be available to all EPCs and others conducting reviews of diagnostic tests. The related projects can be accessed on the right side of this page.

Structured Abstract


Meta-analyses of sensitivity and specificity pairs reported from diagnostic test accuracy studies employ a variety of statistical models for estimating mean performance and performance across different test thresholds. The impact of these alternative models on conclusions in applied settings has not been studied systematically.


We constructed a database of PubMed-indexed meta-analyses (1987–2003) from which 2×2 tables for each included primary study could be readily extracted. We evaluated the following methods for meta-analysis of sensitivity and specificity: fixed and random effects univariate meta-analyses using inverse variance methods; univariate random effects meta-analyses with maximum likelihood (ML; both using a normal approximation and the exact binomial likelihood to describe between-study variability); bivariate random effects meta-analyses (both using a normal approximation and the exact binomial likelihood to describe between-study variability). The bivariate model using the exact binomial likelihood was also fit using a fully Bayesian approach. We constructed summary receiver operating characteristic (SROC) curves using the Moses-Littenberg fixed effects method (weighted and unweighted) and the Rutter-Gatsonis hierarchical SROC (HSROC) method. We also obtained alternative SROC curves corresponding to different underlying regression models [logit-true positive rate (TPR) over logit-false positive rate (FPR); logit-FPR over logit-TPR; difference of the logit-TPR and logit-TPR over their sum; and major axis regression of logit-TPR over logit-FPR].


We reanalyzed 308 meta-analyses of test performance. Fixed effects univariate analyses produced estimates with narrower confidence intervals compared to random effects methods. Methods using the normal approximation (both univariate and bivariate, inverse variance and ML) produced estimates of summary sensitivity and specificity closer to 0.5 and smaller standard errors compared to methods using the exact binomial likelihood. Point estimates from univariate and bivariate random effects meta-analyses were similar when performing pairwise (univariate vs. bivariate) comparisons, regardless of the estimation method (inverse variance, ML with normal approximation, or ML with the exact binomial likelihood for estimation). Fitting the bivariate model using ML and fully Bayesian methods produced almost identical point estimates of summary sensitivity and specificity; however, Bayesian results indicated additional uncertainty around summary estimates. The correlation of sensitivity and specificity across studies was imprecisely estimated by all bivariate methods. The SROC curves produced by the Moses-Littenberg and Rutter-Gatsonis models were similar in most examples. Alternative parameterizations of the HSROC regression resulted in markedly different summary lines in a third of the meta-analyses; this depends to a large extent on the estimated covariance between sensitivity and specificity in the bivariate model. Our results are generally in agreement with published simulation studies and the theoretically expected behavior of meta-analytic estimators.


Bivariate models are more theoretically motivated compared to univariate analyses and allow estimation of the correlation between sensitivity and specificity. Bayesian methods fully quantify uncertainty and their ability to incorporate external evidence may be particularly useful for parameters that are poorly estimated in the bivariate model. Alternative SROC curves provide useful global summaries of test performance.