Skip to main content
Effective Health Care Program

Development and Implementation of Secure Statistical Analyses on Distributed Databases


Topic Abstract

Background: In 2007, the DEcIDE Network initiated an evaluation of two different prototypes of distributed research networks. The general goals of the initiatives are to assess the advantages and limitations of using distributed database networks to support population-based research. The potential uses of distributed networks include studies of clinical effectiveness, comparative effectiveness, and safety of medical products and interventions.

In developing the two prototypes, it has also been recognized that new methodological approaches are needed in order to fully implement a distributed network while maintaining the highest levels of data confidentiality and privacy. As a result, this current project is to develop, test, and validate a method for secure statistical analysis on distributed health databases.

The successful implementation of the secure statistical analysis within a distributed database model will overcome one of the largest hurdles in fully developing such a model, namely, the current need to combine data for multivariate analyses. Obviating this need will save the time and resources currently spent developing data-sharing agreements and will greatly minimize the patient confidentiality and proprietary concerns.

Objectives: To develop methods and electronic tools (e.g., SAS macros) to enable multivariate analyses to be conducted using multiple datasets that are stored separately. The project will focus initially on linear and logistic regression.

Methods: This work is intended to enable remote execution of multivariate statistical analysis within the context of a multi-institutional distributed research network. The research project will use simulated data and/or data from previously published studies, including both electronic medical records and administrative claims data. Thorough testing of the macros will be conducted to ensure the accuracy of the results.

Expected Outputs: Two scientific reports will be prepared, including a comprehensive implementation manual targeting master level statistical analysts. In addition, computer code (i.e., SAS macros) will be made publicly available.