Skip to main content
Effective Health Care Program

Options for Developing a Repository of Expired Patient Registries

Research Report
Download PDF687.2 KB

Authors

Richard E. Gliklich, M.D.
Michelle B. Leavy, M.P.H.
Laura Khurana
Daniel Levy, M.S.
Daniel M. Campion, M.B.A.

Abstract

Objectives. Patient registries have received significant attention in recent years and are being used for a variety of purposes. In discussions for a previous Agency for Healthcare Research and Quality (AHRQ) report, stakeholders suggested that a voluntary repository of expired (i.e., closed) patient registries might be a useful tool for ensuring that registry databases remain accessible for future research. The concept of a repository of expired registries raises several questions related to the feasibility, value, and potential cost of such a repository, as well as issues related to research ethics, governance, data access and use, patient privacy, technical requirements, legal considerations, and incentives for donating data to the repository. The purpose of this project is to examine these questions, explore options, and provide actionable information to AHRQ for developing a repository of expired patient registries, should it be determined that such a repository would be both feasible and valuable.

Data Sources. Not applicable.

Methods. Background research was conducted using literature reviews, Internet searches, and discussions with stakeholders and other relevant organizations. This research focused on identifying relevant, existing repositories of clinical study data and publications that discussed the major issues involved in setting up such a repository. Stakeholder engagement also was a key component of this project. Stakeholder perspectives were gathered through discussions and an in-person meeting. The in-person stakeholder meeting included individuals representing academia and research, government and funding agencies, industry, health care providers and provider organizations, patient/consumer organizations, payers, journal editors, and legal and patient privacy experts. The objective of the meeting was to provide a forum for stakeholders to discuss major issues related to creating a repository of expired registries and to assess the feasibility and potential value of such a repository.

Results. Background research and stakeholder feedback clearly demonstrated that there is interest in the research community in a repository of expired registries, and that such a repository is feasible from operational, legal, ethical, and technical standpoints. Stakeholders noted the potential value of a repository of expired registries and identified key objectives of such a system. Despite the interest in this concept, stakeholders did not see a current business case for the repository, making it unlikely that a private entity would develop such a repository on its own and suggesting that the repository would need to be developed and supported with government funding. Stakeholders described a range of possible options for developing and maintaining the repository that were classified into two anchoring models on each end of the spectrum: a basic data archive model and a data archive with research support model.

Conclusions. The report concludes that there is a clear interest among stakeholders in a repository of expired registries, but the lack of incentives for registry owners to donate their data is a critical barrier to a successful program.

Author affiliations:

Richard E. Gliklich, M.D.1
Michelle B. Leavy, M.P.H.1
Laura Khurana1
Daniel Levy, M.S.1
Daniel M. Campion, M.B.A.1

1Outcome DEcIDE Center, Cambridge, MA

Executive Summary

Patient registries have received significant attention in recent years and are being used for a variety of purposes, including accruing additional evidence on the effectiveness of new interventions in particular populations, understanding practice patterns and adherence to guidelines, examining patient outcomes, supporting safety surveillance initiatives, and demonstrating value for reimbursement. Given the large amount of funding currently devoted to patient registries and the significant contributions (data and effort) from patients and healthcare providers to these registries, it is important to ensure that the broader societal value that can be derived from registry databases is maximized and ideally not lost when the registry expires. A registry may expire for any number of reasons. For example, the registry may fulfill its purpose, exhaust its funding, or find that its objectives become less scientifically relevant due to changes in treatment patterns (e.g., the product under study is superseded by a new treatment). In discussions for a previous Agency for Healthcare Research and Quality (AHRQ) report, stakeholders suggested that a voluntary repository of expired patient registries might be a useful tool for ensuring that registry databases remain accessible for future research. The concept of a repository of expired registries raises several questions related to the feasibility, value, and potential cost of such a repository, as well as issues related to research ethics, governance, data access and use, patient privacy, technical requirements, legal considerations, and incentives for donating data to the repository.

The purpose of this project is to examine these questions, explore options, and provide actionable information to AHRQ for developing a repository of expired patient registries, should it be determined that such a repository would be both feasible and valuable. Background research was conducted using literature reviews, Internet searches, and discussions with stakeholders and other relevant organizations. This research focused on identifying relevant, existing repositories of clinical study data and publications that discussed the major issues involved in setting up such a repository. Several existing repositories of clinical study data were identified. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Data Repository, the National Database for Autism Research (NDAR), and the National Heart, Lung and Blood Institute (NHLBI) Biologic Specimen and Data Repository are profiled in this paper because they represent models that are particularly relevant to AHRQ in terms of scale, focus, operational approach, and technical structure.

Stakeholder engagement was a key component of this project. Stakeholder perspectives were gathered through discussions and an in-person meeting. The in-person stakeholder meeting included individuals representing academia and research, government and funding agencies, industry, health care providers and provider organizations, patient/consumer organizations, payers, journal editors, and legal and patient privacy experts. The objective of the meeting was to provide a forum for stakeholders to discuss major issues related to creating a repository of expired registries and to assess the feasibility and potential value of such a repository.

Topics covered in the meeting included governance, patient privacy, research ethics, data access and use, technical requirements, and legal considerations. Sound, transparent governance policies and procedures that address data access rules, data ownership, and incentives for donating data were identified as a critical component of a potential repository. Stakeholders also suggested that the governing body include a broad group of stakeholders. To protect patient privacy, the repository could store only de-identified data; however, this would limit the usefulness of the data for linkage projects. Alternately, the repository could store patient identifiers, provided that the informed consents supporting the original data collection allowed for this use. An institutional certification model could be used to determine whether data could be donated and under what conditions donated data could be used for new research. Regarding data access and use, stakeholders noted that data access requests would need to be accompanied by institutional review board approval for the proposed research project and recommended that the repository store information on the original study and how the data were collected to ensure that researchers can draw valid conclusions from the data. The repository would also need to store some level of metadata to allow for searching of the repository data assets. Various technology models could be used, and the selection of the model and the related technology requirements would be driven by resource considerations.

Regarding legal considerations, stakeholders noted that numerous Federal and state laws potentially are relevant. Additional legal analysis completed for this project found that the critical factors are whether the data will be individually identifiable, how data will be maintained and used, whether the uses of the data will include research, and the sources of the data. This paper discusses Federal laws related to government agency access to and oversight of federally supported projects and the information produced under such projects; the Health Insurance Portability and Accountability Act Privacy, Security, and Breach Notification Rules; the Common Rule; “Part 2” regulations related to alcohol and substance abuse treatment information; the Privacy Act of 1974; the Federal Information Security Management Act; Federal laws related to information collection (e.g., the Paperwork Reduction Act); the Freedom of Information Act; and state confidentiality laws. Examples of ongoing Federal repositories underscore that these potential legal issues can be managed successfully.

Background research and stakeholder feedback clearly demonstrated that there is interest in the research community in a repository of expired registries, and that such a repository is feasible from operational, legal, ethical, and technical standpoints. Stakeholders noted the potential value of a repository of expired registries and identified key objectives of such a system. Despite the interest in this concept, stakeholders did not see a current business case for the repository, making it unlikely that a private entity would develop such a repository on its own and suggesting that the repository would need to be developed and supported with government funding. If AHRQ were to pursue the development of a repository, the Agency could present a clear rationale based on the importance of data sharing for the efficient use of limited health research resources and the Agency’s continuing efforts to encourage the use of high-quality registries for clinical research.

Stakeholders described a range of possible options for developing and maintaining the repository that were classified into two anchoring models on each end of the spectrum: a basic data archive model and a data archive with research support model. The data archive model represents a low-budget, basic approach to data archiving, while the data archive with research support model represents a high-budget, sophisticated approach. Many intermediate models, which would use the basic approach and incorporate one or more features from the sophisticated approach, are possible. This paper examines the strengths and limitations of the models, discusses possible incentives for donating data under each model, and considers the estimated costs for developing and operating each model.

The report concludes that there is a clear interest among stakeholders in a repository of expired registries, but the lack of incentives for registry owners to donate their data is a critical barrier to a successful program. AHRQ may be able to require donation of data from registries funded by the Agency, but few incentives exist to encourage other registry sponsors to contribute their data. Prior to investing resources in developing a sophisticated repository, it may be necessary to work with stakeholders to identify donation incentives to ensure that the repository has sufficient data to support research projects. Alternately, AHRQ may pursue an incremental approach to developing the repository, in which the initial repository follows the basic data archive model and additional features are added as the repository grows and its use requirements are better understood. Further discussions with both stakeholders and registry owners are highly recommended to help AHRQ further define requirements, priorities, incentive structures, and funding options in planning next steps.

Introduction

Patient registries have received significant attention in recent years. A patient registry is defined as “an organized system that uses observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition, or exposure, and that serves one or more predetermined scientific, clinical, or policy purposes.”1 Common purposes for patient registries include evaluating the safety, effectiveness, or quality of medical treatments, products, and services and studying the natural history of diseases. Some registries are developed and maintained solely to assist in care delivery, coordination, and quality improvement, but many serve broader research purposes.

When properly designed and conducted, patient registries can provide unique insights into real world clinical practice, effectiveness, safety, and quality. As health care stakeholders have increasingly recognized the value of real-world evidence, interest in developing registries has grown. For example, private and public sector payers for health care are potentially interested in registries as a means to accrue additional evidence on the effectiveness of new interventions in particular populations.2 The Centers for Medicare and Medicaid Services (CMS) Coverage with Evidence Development (CED) is an example of a program that will sometimes require a registry for this purpose.3 Medical and patient associations are turning to registries as a means of understanding practice patterns, adherence to guidelines, and patient outcomes.4 Academia, industry, and government groups are using registries for a variety of purposes, including understanding disease and treatments, conducting safety surveillance, and demonstrating value for reimbursement.1

In the past few years, registries have also received significant new attention as potential sources of data for comparative effectiveness research (CER). Reports from the Institute of Medicine (IOM) and the Congressional Budget Office (CBO) in 2007 cited the importance of patient registries in developing comparative effectiveness evidence. The IOM identified registries as an important potential data source for CER.5 The CBO report noted, “Registries collect additional information that is typically not contained in claims records, such as measures of health status or test results. In the United States, a number of registries—established or managed by various entities, including medical specialty societies and product manufacturers—have been used to help determine the clinical effectiveness and cost effectiveness of various products or services.”6 A 2009 white paper, generated by the Brookings Institution, described multiple categories of methods for CER including systematic reviews of existing research, including meta-analysis; decision modeling with or without cost information; retrospective analysis of existing clinical or administrative data, including natural experiments; prospective non-experimental studies, including registries that observe patterns of care and outcomes but do not assign patients to specific groups; and experimental studies, including randomized clinical trials, in which patients or groups of patients are assigned to alternative treatments, practices, or policies. It further noted that all of these categories are important, that there have been important advances in methods that improve the validity and design and, in particular, that there has been considerable progress in the design and use of clinical registries.7

The increased interest in using registries for CER has resulted in new funding opportunities. For example, the National Institutes of Health (NIH) call for Challenge Grants as a first disbursement of appropriated funding under the comparative effectiveness component of the American Recovery and Reinvestment Act of 2009 (ARRA) specifically called for a number of registries. These included such diverse areas of CER as treatments for chronic childhood arthritis, musculoskeletal and skin disease, fibromyalgia, cancer primary prevention, cancer screening, cleft palate, diabetes, medical implants, atrial fibrillation, autism spectrum disorders, rare diseases, and intervention versus nonintervention in management of asymptomatic vascular abnormalities.8 Several other grant programs using ARRA funding, such as those managed by the Agency for Healthcare Research and Quality (AHRQ), included patient registries as potential study options, and AHRQ awarded grants to new registry-based projects. A 2009 report from AcademyHealth examined the volume and range of costs of recent CER.9 Using a combination of interviews with researchers and organizations and reviews of two research listing databases, the authors concluded that prospective cohort studies, registry studies, and database studies comprise the largest block of CER activities (54 percent) and that among observational studies 23 percent are registry studies.

Similarly, the U.S. Food and Drug Administration (FDA) uses registries in assessing the safety and effectiveness of drugs and devices when considering applications from manufacturers, as well as for monitoring products post-approval. For example, since 2005, the FDA Center for Devices and Radiological Health has called for some 120 post-approval studies, many of which use new or existing registries to study the real-world effectiveness of specific devices in community practice.10

Rationale

The significant investment in patient registries, and particularly the investment of public funds to create new registries, raises questions about what happens to registry data when a registry expires (i.e., data collection ends and the registry closes). A registry may expire for any number of reasons. For example, the registry may fulfill its purpose, exhaust its funding, or find that its objectives become less scientifically relevant due to changes in treatment patterns (e.g., the product under study is superseded by a new treatment). A registry may collect a large amount of data on a broad patient population, sometimes over several years. The registry investigators may publish findings related to the registry objectives, but the data may also serve other purposes once the registry expires. In discussions for a previous AHRQ report, stakeholders suggested that a voluntary repository of expired patient registries might be a useful tool for ensuring that registry databases remain accessible for future research. For example, the data may be useful for new or supplementary analyses (to avoid duplication of effort), to inform protocol development for future studies, or for comparing, linking, or combining with other data to answer new research, policy, or public health questions. In 2010, AHRQ funded the development and piloting of a new Registry of Patient Registries (RoPR). The RoPR will be a searchable, central listing of patient registries in the United States, with the primary goal enabling interested parties to identify registries in a particular area to promote collaboration, reduce redundancy, and improve transparency. Secondary objectives include encouraging and facilitating the use of common data elements and definitions, providing a central repository of searchable registry findings, and serving as a recruitment tool for researchers and patients interested in participating in patient registries. Once the RoPR is created, there will be an increase in registry visibility and potentially interest from researchers who may seek data from registries that have expired for new uses.

Given the large amount of funding currently used for the development and operation of patient registries and the significant contributions (data and effort) from patients and health care providers to these registries, it is important to ensure that the broader societal value that can be derived from registry databases is maximized and ideally not lost when the registry expires. In discussions for a previous AHRQ report, stakeholders suggested that a voluntary repository of patient registries might be a useful tool for ensuring that registry data are not lost when a registry ends.11

The concept of a repository of expired registries raises several questions. First, should there be a repository of expired registries? Is there value in developing a repository of expired patient registries, and is it feasible to develop such a repository? How would the repository address issues related to research ethics, governance, data access and use, patient privacy, technical requirements, and legal considerations? Assuming that a repository is both feasible and valuable, what are the requirements for the repository? What models might AHRQ use to develop and maintain a repository of expired registries? What would motivate researchers to donate their registries to a repository of expired registries? Lastly, what are the costs of setting up and maintaining a repository of expired registries?

Project Objectives

The objectives of this project are to examine questions related to creating a repository of expired patient registries, assess the overall value and feasibility of such a repository, and explore options for developing the repository. A key component of this project is engagement with stakeholders, including Federal partners, funding agencies, industry sponsors, researchers, health care providers, payers, and patients, to ensure that their views are considered and incorporated. The goal of this paper is to provide actionable information to AHRQ for developing a data repository, should it be determined that such a repository will be both feasible and valuable, that will be relevant to the needs of the Medicare, Medicaid, and other Federal health care programs and will reflect the overall goals of the Effective Health Care program.

The paper begins by describing existing repositories of clinical study data and discussing how they have addressed the key issues related to governance, research ethics, patient privacy, data access and use, technical requirements, and legal considerations. The paper then discusses stakeholder input related to these issues and summarizes the conclusions related to feasibility and value. Next, the paper examines the potential legal requirements for a repository. The sections on stakeholder perspectives and legal issues cover many of the same topics and often reach the same conclusion. They are presented separately here, though, because they examine the issues from different viewpoints. The stakeholder perspectives section highlights the major concerns and suggestions of the stakeholder representatives, while the legal issues section draws on analysis of existing laws and regulations to draw conclusions as to what is legally feasible. Lastly, the paper makes recommendations for moving forward with a repository of expired registries and explores two models or approaches for such a project. Each proposal includes information on how registries would be identified for inclusion; what incentives would be available for patient registries to donate data; what information would be included; how information would be verified; how information would be provided to researchers; and the cost considerations. The paper assumes that the repository of expired registries would be located within the United States and hold data from studies conducted within the United States. While many registries are international in scope, the legal and ethical issues involved in donating and storing data from those registries in the repository are beyond the scope of this paper.

Methods

Information for this project was gathered through literature reviews, Internet searches, discussions, and a large in-person stakeholder meeting. The literature reviews and Internet searches focused on identifying relevant, existing repositories of clinical study data and any publications that discussed the major issues involved in setting up such a repository. PubMed was used to identify relevant literature. Search engines, such as Google and Google Scholar, were used to identify existing repositories of clinical study data. In addition, an extensive search of the NIH Web site was conducted to identify relevant repositories and other documents. In some cases, discussions were held with representatives of the NIH repositories cited. The findings from the background research and discussions are summarized in the “Background Research” and “Stakeholder Perspectives on Major Issues” sections of this paper.

The in-person stakeholder meeting was held on January 18, 2011 at the AHRQ Conference Center. Participants included 49 individuals representing academia and research, government and Federal funding agencies, industry, health care providers and provider organizations, patient/consumer organizations, payers, journal editors, and legal and patient privacy experts. The breakdown of participants by stakeholder group is depicted in Figure 1.

Figure 1. Meeting participants by stakeholder group.

This pie chart shows the distribution of meeting participants by stakeholder group. The largest stakeholder group was government and federal funding agencies, with 14 participants. Next was academia and research with 10 participants, followed by health care providers and provider organizations with 9 participants, industry with 6 participants, and legal and privacy experts with 4 participants. Patient/consumer organizations, payers, and journal editors each had 2 stakeholders participate in the meeting. The total number of participants in the meeting was 49.

The purpose of the meeting was to provide a forum for stakeholders to discuss the feasibility and potential value of creating a voluntary data repository for archiving expired patient registries. The primary objectives were to (1) discuss issues related to governance, patient privacy, research ethics, data access and use, and technical setup for a repository of expired registries, and (2) assess the feasibility and potential value of such a repository. Outcome DEcIDE Center staff moderated the meeting. The findings from the meeting are incorporated into the “Stakeholder Perspectives on Major Issues” and “Proposals for a Repository of Expired Registries” sections of this paper.

Background Research

Several existing repositories of clinical study data were identified through the literature review, Internet searches, and discussions with stakeholders. Of the identified repositories, three are profiled here as representative models that are particularly relevant to AHRQ because of their scale, focus, operational approach, and technical set-up. The repositories profiled here are the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Data Repository, the National Database for Autism Research (NDAR), and the National Heart, Lung and Blood Institute (NHLBI) Biologic Specimen and Data Repository.

Other repositories were identified and are not profiled because they do not represent relevant models for this project. For example, the Virtual International Stroke Trials Archive (VISTA) is an international repository that houses data from completed stroke trials. Investigators from each study in the repository serve on the VISTA governing board, and the original study investigators are active participants in new research using their data. While this model has been successful, it is not highly scalable and depends on sustained investigator interest. The Inter-university Consortium for Political and Social Research (ICPSR) project, based at the University of Michigan, aims to facilitate social sciences research by storing and facilitating access to research datasets. The repository holds a large number of datasets in a range of social science areas and has detailed policies on data donation, curation, and access. While much of the data is not health-related and therefore subject to different legal and ethical concerns, this project may be a useful reference for technical considerations and governance issues, should AHRQ decide to develop a repository of expired patient registries. Biorepositories and archives that focus primarily on storing biologic and genetic specimens are also not profiled, because the legal, ethical, and operational issues involved in those repositories are, in many ways, different than those involved in storing clinical data.

NIDDK Central Data Repository

Overview

In 2003, the NIDDK established Central Data, Biosample, and Genetic Repositories to house samples and clinical data collected during completed and ongoing clinical studies funded by NIDDK.12 In creating these repositories, NIDDK aimed to “increase the utility of NIDDK sponsored research by providing access to the samples and data to a wider research community than those research groups involved in the studies.”13 As of March 2011, datasets from 28 clinical studies are stored in the Central Data Repository (CDR), and most are from large, multi-site, clinical studies funded by the NIDDK. Forty-nine articles have been published in peer-reviewed journals using data from the CDR.14 The CDR staff receives data access requests about 50 times a year, and 188 external requests for data have been processed.15

The CDR serves as the archive for clinical data, maintains the linkages between clinical data and specimens housed by the Biosample and Genetic repositories, and manages requests for samples of those specimens. The CDR’s Web site provides information to the public about available datasets (see Table 1) and facilitates data requests and submissions.

Governance and Oversight

NIDDK contracts out the management of each Repository to separate contractors. The CDR is managed by RTI International. Access to CDR data is regulated by a group of reviewers and an arbiter from the NIDDK whose mission is to review data requests for scientific appropriateness and to ensure that the data requests are consistent with the patient consents given at the time of the original studies. The reviewers and arbiter are not necessarily the same individuals for each request, but are always NIDDK employees selected by the NIDDK project officers.15 The reviewers offer their opinions based on the data request application and accompanying institutional review board (IRB) approval (or an IRB determination that the research is exempt from such approval), and the arbiter makes the final approval decision and notifies the contractor of that decision.

For data of significant value, the terms of NIDDK clinical research funding require that the researcher donate the data to the Repository at the close of the study; these terms are incorporated into the Notice of Grant Award. Some studies not funded by NIDDK have expressed interest in donating their data and biosamples to the CDR; in these cases, the CDR has executed a Material Transfer Agreement with the donor and amended the agreement as necessary if only data are being donated.16

Patient Privacy

The CDR only accepts and houses data that have been de-identified according to the Health Insurance Portability and Accountability Act (HIPAA) requirements for limited datasets (LDS).17 The CDR has returned data to study managers or data coordinating centers that were not properly de-identified or consented. Datasets accepted by the CDR have study-specific patient IDs associated with them. For samples, the CDR adds a three-digit prefix code specific to each study and to each site within a study. The result is a CDR ID that is unique across all of the samples housed in the Repositories.16

Research Ethics

The CDR has established and documented procedures for processing and storing donated data, which include assessing the completeness, utility, and integrity of the data (including recreating analyses of published results); assigning and managing site IDs for the sites that contributed data to the dataset; and extracting metadata about the dataset for display on the public CDR Web site.18

Before data are released from the CDR for use in other research, the data request is reviewed to ensure consistency with the informed consent given by the patient at the time of the original study. To facilitate this, the CDR has built a consent database that is available to the NIDDK staff reviewing data requests.15 A separate bioethics review is available if there are remaining questions after review of the original informed consents; thus far, such a review has not been needed.16

Data Access and Use

To ensure that researchers understand the strengths and limitations of the data available in the CDR and can draw appropriate conclusions from their findings, metadata on each available dataset is provided on the CDR Web site (Table 1). CDR staff members are also available to answer questions and to help identify the most appropriate dataset(s) for a particular research purpose.

To initiate the data request process, researchers complete a request through the Automated Data Request System (ADRS) on the Repository Web site, which requires an executive summary describing the proposed research, IRB approval (or an IRB determination that the research is exempt from such approval), a Data Use Certification (a Material Transfer Agreement), and responses to more detailed questions about the proposed research. These documents are submitted electronically through the ADRS, and original signed copies are mailed to the CDR. The CDR submits this documentation to the project officer at NIDDK, who designates a team of NIDDK reviewers to assess the request for scientific appropriateness and consistency with the consent forms used in the requested datasets. If approval is granted, the CDR is notified and then releases the data to the requestor. Data requests are generally free of charge, although a fee applies if CDR staff provides analytic support for the requestor.19 Manuscripts using CDR data must acknowledge the NIDDK Repository, and in some cases, cite the specific dataset(s) from which the data were obtained.20

Table 1. Metadata publicly available for NIDDK Central Data Repository (CDR) datasets.
Category Description
General Description Narrative description of study that yielded dataset.
Metadata
  • Study design and objectives.
  • Participating centers and principal investigators.
  • Inclusion and exclusion criteria; enrollment details.
  • Outcome(s) of interest.
  • Funding organization.
Protocol or Manual of Operations Governing operational document for the study.
Forms Study case report forms (CRFs).
Publications List of publications using study data.
Roadmap Summary of all documents and files available for the dataset.
Dataset Integrity Check Description of integrity check conducted in an attempt to duplicate published results; includes narrative summary, tables comparing published and calculated results, and SAS code used to replicate results.
Study Samples List of biologic or genetic samples available from the study.

Technical Considerations

The CDR accepts donated data in multiple formats and converts all datasets into SAS format.17 Data are sent to approved requestors via a secure FTP site; datasets are not directly accessible through a Web portal because of concerns about dataset size and health data security.13 Users can browse available metadata for the datasets on the CDR Web site or perform a keyword search of the metadata, including ancillary study documents.

Resources

NIDDK’s contract to the data management vendor had a 2010 annual budget of $962,000; this included all activities related to the CDR’s multiple purposes (including coordination and administrative duties for the Biosample and Genetic Repositories), and not exclusively the curation and distribution of archived databases.16 NIDDK estimates that five to six contractor employees work on the contract currently, plus two part-time project officers at NIDDK. Overall, considering the other duties of these NIDDK and contractor employees, an estimated 2.0 full-time employees (FTEs) are working exclusively on the CDR, managing archived data from 28 studies and sample inventory from 37 additional ongoing studies in its current operational phase.16

National Database for Autism Research

Overview

NDAR is a biomedical informatics system and research data repository developed and housed at the NIH Center for Information Technology. Its stated purpose is to help accelerate research on autism spectrum disorders by creating an infrastructure that integrates heterogeneous datasets, allowing access to much more quality research data than individual investigators would be able to collect on their own.21 As of March 2011, NDAR houses data from about 20 studies,22 and another 55 studies have plans to share their data through NDAR, including many being conducted at centers operating under NIH “Autism Centers of Excellence (ACE)” grants.23 Several peer-reviewed articles have been published focusing on the technology and informational modeling behind NDAR,24-25 and NDAR staff report that they have received many requests to access the data, both from individuals associated with ACEs and from other researchers.22

Governance and Oversight

The NDAR Governing Committee, which is responsible for the on-going management and stewardship of NDAR policy and procedures, is comprised of the Director of the National Institute of Mental Health (NIMH) and several other NIH Institute and Center Directors or their designees.26 A Data Access Committee (DAC) is charged with reviewing data access requests; members of the DAC include project officers for grantees22 and Federal staff with expertise in areas such as the relevant scientific disciplines, research participant protection, and privacy.26

All investigators who receive NIH support to conduct autism research are expected to submit descriptive information about their studies (such as the protocol, questionnaires, study manuals, variables measured, and other supporting documentation) for inclusion in NDAR. Investigators may be required by the “Terms and Conditions” of their grant to submit study data.26

Patient Privacy

Data submitted to NDAR are required to be “de-identified such that the identities of data subjects cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users.” Once received by NDAR, data are assigned Global Unique Identifiers (GUIDs), computer-generated random alphanumeric codes that are unique to each research participant in NDAR. A detailed description of the process for de-identifying data and for assigning GUIDs can be found in the NDAR policy document26 and the 2010 article by Johnson et al.25

NDAR expects that data submission to NDAR will be explicitly discussed in the informed consent process for prospective studies that are designed with the intent to submit data to NDAR. For retrospective studies, since informed consents are less likely to have explicitly addressed data donation, it is the responsibility of the submitting institution and its IRB to determine whether a study is appropriate for submission to NDAR.26 While not required, NDAR also encourages researchers submitting data to consider whether a Certificate of Confidentiality is appropriate for their data as an additional safeguard against disclosure of research participant identities.26

Research Ethics

Researchers submitting data to NDAR are required to provide a written certification from an official of the submitting institution stating that they approve of the data submission. An additional certification is required that should state that the data submission is consistent with applicable laws and regulations and that donation of the data is ethically permissible.26 The certification criteria are identical to those in the NIH genome-wide association studies (GWAS) policy27 and discussed in the “Institutional Certification” section of this paper.

Those wishing to access NDAR data are required to sign a Data Use Certification that confirms that they will use the data only for the approved research.26 The DAC reviewing these requests determines whether the proposed use of the dataset is scientifically and ethically appropriate.

Data Access and Use

Once a researcher is granted access to NDAR by the DAC, they receive an account to log into the NDAR research portal; the account allows them to simultaneously query data stored in the NDAR repository and data from external sources with which NDAR maintains a federated link. The queries produce datasets that can include clinical data; phenotypic, imaging, or genomic data; or other data such as outcomes data. Queries are run on the entire NDAR data repository and federated data at once, so query results can be returned from multiple studies. The GUID, described previously, ensures that patients present in more than one NDAR data source are not duplicated in query results. Researchers can use their query results to create an “NDAR Study,” a collection of GUIDs that serve as references to the original research data and are able to be shared with other NDAR users.28 Manuscripts and presentations using data from NDAR are expected to acknowledge the Contributing Investigator who conducted the original study, the funding organization supporting the work, and NDAR.28

Technical Considerations

NDAR is the most technologically complex of the examples profiled here. It functions as both a data warehouse and a federation portal (e.g., a single point of entry to access databases located elsewhere). In addition to the data submitted by individual investigators that are stored and maintained by NDAR, researchers have access to a wider net of data through the federation linkages NDAR has (or has planned to) put in place with organizations such as the Autism Genetic Resource Exchange, the Interactive Autism Network, the NIMH Genetics Repository, the NIMH Transcriptional Atlas of Human Brain Development, and the Pediatric MRI Data Repository.28

NDAR has developed several standalone software applications to facilitate the collection and exchange of meaningful data. Separate data dictionary tools for genomics, imaging, and clinical assessment data are available to help those submitting data define their data elements in a way that is consistent with data from other studies. The GUID Tool was developed in collaboration with the Simons Foundation Autism Research Initiative to assign GUIDs to data being submitted, which involves checking existing NDAR data to ensure that duplicate patients receive the same GUID. The data validation tool is an application run on the researcher’s local machine that verifies the data conform to the required format and range values, and then packages and imports the data into NDAR.29 Because submitting data to NDAR requires effort by the investigator, NDAR provides tools to help investigators estimate the cost of preparing and submitting data. These costs can be incorporated into a project budget in cases where the investigators (e.g., those receiving NIH funding) decide to submit the data to NDAR during the study planning phase or are required to submit the data according to the terms of their funding.

NDAR grants data access to researchers through a Web portal. Researchers can query NDAR data and download the results of their query in XML or CSV format.

Resources

Over the course of developing NDAR, the software engineering team used development, testing, demo, and production environments; this is a more complex system than was originally imagined, but it has proven to be helpful.22 In its current operational phase, NDAR employs 3 to 4 FTEs with expertise in genomics, imaging technology, management, and training. Eight to ten FTE developers worked for 4 years on developing, testing, and refining the database. The total project funding for 2010, including operation and development, was $2,400,000.30

NHLBI Biologic Specimen and Data Repository

Overview

The NHLBI Biologic Specimen and Data Repository was created by the combination of two formerly separate NHLBI programs: the NHLBI Biologic Specimen Repository and the NHLBI Data Repository. The Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) was established in September 2008 under contract to the NHLBI Division of Blood Diseases and Resources to serve as a central management entity for both biospecimens and datasets in the Repository. The mission of the Repository is to acquire, store, and distribute research biospecimens and datasets to the scientific community. As of March 2011, datasets from 82 separate studies are available for use by approved data requestors,31 and 189 requests for data access have been received since September 2009.32 An additional 143 requests for teaching dataset have been received and distributed in that time.

Governance and Oversight

Repository employees review requests for data access for completeness and then forward them to the Division of Cardiovascular Sciences (DCVS) at NHLBI for review and final approval. Many NHLBI-funded studies are required to submit data to the Repository.33

Patient Privacy

Datasets submitted to the Repository must be submitted in two parts: a Non-Commercial Purpose Dataset (consisting of participants who requested during the informed consent process that their data not be used for commercial purposes) and a Commercial Purpose Dataset (consisting of participants who consented for their data to be used in this way). Participants who did not consent to their data being shared with researchers other than their principal investigator are not included in either dataset. In addition, obviouspatient identifiers must be removed and potentially identifiable variables must be altered, as detailed in the Repository’s Policy for Dataset Preparation.34 The datasets for distribution meet the requirements for de-identified data as defined by HIPAA. The researcher donating the data is responsible for ensuring that the data meet these standards, but NHLBI and BioLINCC staff review the final datasets and documentation prior making them available for sharing and provide feedback if additional redaction or recoding is needed.

Research Ethics

Data request applications must contain a summary of the proposed research, IRB approval, the investigator’s curriculum vitae, and a signed Research Materials Distribution Agreement, which clarifies the legal responsibilities of the data recipient and the Repository.35 The DCVS reviews the proposed research for significance, appropriateness, design, and “ethical and legal considerations including consistency with the terms of the informed consent and compliance with human subjects and HIPAA regulations.”33

Data Access and Use

Metadata describing each of the available datasets is posted on the Repository Web site for review by potential researchers (see Table 2). The Repository staff is also available to answer questions regarding a particular dataset’s strengths and limitations and appropriateness for a specific research use. Once a data request is approved, the applicant is given access to the data and documentation electronically through the secure BioLINCC site.33 As specified in the Research Materials Distribution Agreement, any publications or presentations resulting from Repository data are required to acknowledge the NHLBI BioLINCC as the source of that data.35

Table 2. Metadata publicly available for NHLBI Biologic Specimen and Data Repository datasets.
Category Description
General Study Data
  • Link to listing on ClinicalTrials.gov.
  • Study type (epidemiology study or clinical trial).
  • Date that metadata entry was prepared and last updated.
  • Study period.
  • Consent (restricted, unrestricted, original researcher only, or proprietary).
  • Commercial use restrictions.
  • NHLBI division that funded study.
Abstract Objectives, background, subjects, study design, and results/conclusions.
Study Documents May include protocol, coding manual, data dictionary.

Technical Considerations

Study datasets are stored by the Repository and released to approved researchers upon request. The Registry Web site has search functionality with the ability to query dataset metadata for keywords, type of sample available (specimens, dataset, or both), consent type, study type, and presence of commercial use restrictions. In addition, a Google Mini Search is available on the Web site to search for keywords present in metadata or ancillary study documents.36

Data are received at the Repository via secure FTP transfer from the NHLBI Program Office, which receives them on CD-ROMs. Data are stored on a secure internal network. Aside from standards applied to the data to ensure patient privacy and appropriate research use (described earlier), no further attempt is made to harmonize data elements or formats.34

Resources

The total project funding for 2010 was $599,990.30

Relevance of Existing Repositories for a Repository of Expired Registries

These examples demonstrate that a repository of clinical data is technically, legally, operationally, and ethically feasible. The number of manuscripts published using repository data and the volume of data requests they manage indicate that there is interest within the scientific community in using archived data from completed studies. The existing repositories also suggest that a new data repository for expired registries could be structured in multiple ways, with varying levels of data quality assessment, data aggregation, and research support. Common features across all three examples include a formal data access policy, availability of at least general information about the data and the parent studies that generated them, and a governing body to provide oversight. A major difference between these repositories and the potential repository of expired registries is the scope; these repositories are limited in scope to a particular disease area, while the repository of expired registries would presumably store data from registries in a wide range of disease areas.

Stakeholder Perspectives on Major Issues

The existence of other repositories of clinical study data suggests that a repository of expired registries is feasible. However, the proposed repository presents many challenges. Stakeholders attended an in-person meeting to discuss major issues related to establishing a repository of expired registries, including governance, patient privacy, research ethics, data access and use, and technical considerations. The key points related these issues are summarized here.

Governance

Governance addresses both the policies and procedures for the repository and the oversight body that will ensure that the repository operates according to these policies and procedures. In considering a repository of expired registries, several governance issues arise. For example, the repository will need a clear plan explaining who can access the data and under what conditions. The repository may establish a data use committee that reviews data requests and approves or rejects them. This committee may also monitor ongoing research projects to ensure that they are conducted in accordance with the data use terms. An important aspect of the review may be assessing the potential for inadvertently re-identifying patients when linking data from multiple databases. Depending on available resources and interest, the repository may not be able to accept all donated databases, as each stored database will have intake and maintenance requirements. An oversight committee could be formed to review donated databases and determine which to accept based on pre-specified criteria (e.g., relevance to the priority conditions, quantity of data collected, cost/benefit analysis of storage costs versus likelihood of future use in analyses and publications, etc.). The governance plan will also need to address data ownership, length of time that databases are stored (e.g., indefinitely, 20 years, etc.), and any fees associated with data use. The stakeholder discussions and background research conducted for this project focused on questions related to the repository owner; purpose and composition of the governing body; review of data requests; donations and storage of registry data; and incentives for donating registry data.

Repository Owner

Stakeholders could not identify clear incentives that would drive a private entity to develop such a repository independently. Many reasons were cited ranging from the perceived limited ability of a private entity to recruit data donations to the lack of a business model to leverage the data to support operations without a more defined understanding of risks and potential revenue sources. As such, the consensus view was that such a repository would need to be developed with government funding. This may change if the archive has inherent value and a clear business case emerges. The following discussions assume that the data repository will be created and managed by AHRQ or a contractor of AHRQ (as opposed to a private entity).

Governing Body

Stakeholders agreed that sound, transparent governance procedures are critical for ensuring that the repository is useful for researchers and compliant with legal and ethical requirements. This conclusion is consistent with the findings from discussions with staff at other data repositories. Stakeholders recommended that the governing body include individuals from across the spectrum of interested parties, including government, industry, providers, patients, researchers, and representatives of registries that contributed data. The purpose of the governing body would be to guide the overall direction and activities of the repository of expired registries. The governing body may be involved in review of data requests and other operational activities of the repository, or it may create committees to oversee these tasks.

Review of Data Requests

The procedures for reviewing data requests are a critical part of the governance plan. Stakeholders recommended using a data access committee to review and approve requests to use repository data. The data requestor would submit information on the proposed research activity, along with IRB approval for the project. This process is similar to that used by existing repositories. Stakeholders also suggested that the governing body could set use fees for accessing the data and commented that use fees could be higher for commercial uses than for non-commercial uses, although both uses should be allowed under appropriate conditions and consistent with data use agreements.

Ownership and Management of Donated Data

The repository of expired registries that stakeholders discussed could include registries that are required to submit their data (e.g., a new registry funded by AHRQ), as well as registries that donate their data voluntarily. Data ownership is an important consideration for registries that donate their data voluntarily. The repository could own the donated data; in this case, the terms and conditions of donating the data to the repository would include a transfer of ownership. In cases where copyright was claimed for the data, the copyright may also need to be licensed or transferred to the repository. Stakeholders noted that the arrangements for data ownership could encourage or discourage donation of data. For example, relinquishing all ownership rights or transferring copyright was seen as a potential disincentive to donation. Stakeholders agreed that the approach to data ownership should be guided by the available resources and the underlying goal of the repository (i.e., to serve as a last resort for data that would otherwise not be maintained or to attract donations of high-quality data and proactively facilitate new research). On balance, stakeholders felt that the ability to be flexible in data ownership, as long as the terms of use were clear and could be readily tracked with the data, would help encourage growth of the repository, but such tracking would come at a higher cost. A related issue is ensuring that the entity donating the data has the necessary rights to do so; stakeholders commented that warrants could be required in the donation agreement.

The concept of voluntary donations of data also raises questions as to whether the repository should have criteria for inclusion. For example, the NIDDK Repository focuses on studies that are considered to be of significant value based on sample size or other factors. Stakeholders suggested that, initially, the only inclusion criteria should relate to contractual issues (e.g., ability to transfer ownership legally) and ethical issues (e.g., appropriate consents or a waiver of consent for the data to be used in future research). Because the proposed repository would cover multiple disease areas and many types of studies, it would be difficult to have appropriate inclusion criteria based on size, number of sites, depth of clinical and/or biological data, length of follow up, or other factors. For example, a rare disease registry that only includes a few hundred patients could be very valuable, as could a single site registry such as the Framingham study. Stakeholders suggested that a value determination would have to be made on a case-by-case basis. In addition, the repository could begin with an inclusive policy and develop tighter restrictions if a large number of donations become available.

A related question is how long to store registry data. This question applies to both voluntarily donated data and data that are required to be submitted to the repository. The repository could store datasets indefinitely, archive them after a set number of years, or archive them after a period of non-use. Again, the stakeholders commented that having a single policy that would apply to all registries is difficult because of the variation in disease area and registry type. Stakeholders suggested that the data be stored indefinitely, unless prevented by resource constraints, or that the repository follow existing guidelines for data retention (e.g., Medicare or commercial payer guidelines). Because of the cost associated with maintaining each dataset in a usable form, there may also need to be ongoing evaluation of the utility of the datasets being stored in the repository and different processes for those actively maintained as readily retrievable (e.g., regularly updated into current technology formats) versus those simply stored in a durable medium.

Incentives for Donating Registry Data

Stakeholders discussed what factors might motivate or incentivize registry owners to donate their data to the repository and offered some suggestions. As discussed above, in some cases, the NIH repositories have set a precedent by requiring donation of data as a condition of funding; agencies such as AHRQ or other public funders might consider a similar requirement. AHRQ could also collaborate with other government agencies to encourage donations. For example, NIH Institutes without repositories could direct investigators that they fund to donate registry data to AHRQ. The FDA could also recommend donations for post-marketing commitment studies. The donation of data from post-marketing studies raises concerns related to proprietary information that may be contained in these studies; however, the FDA is currently exploring approaches to consolidating and adapting its data sets from clinical trials into research-appropriate data sets.37-38 These exploratory projects may result in policies or procedures under which data from post-marketing studies may be included in the repository of expired registries.

Second, stakeholders suggested that registry owners may be motivated to donate data if liability resulting from the data is limited or transferred to the repository. Third, because many private registries that would consider donation would likely do so at the end of their existing funding, stakeholders felt that the burden on the registry donor would need to be limited in order to encourage donations from such organizations. Several stakeholders noted that industry-funded registries are unlikely to be donated because of proprietary concerns, and one stakeholder commented that some open-ended registries (e.g., quality improvement registries) may be willing to donate data from completed years, even if the registry were still ongoing. Lastly, some stakeholders suggested that a donation could result in priority access to the data; there was no clear consensus on this point, as some stakeholders commented that it did not “feel right” in the context of a public resource, but no specific barriers were identified.

Summary of Governance Issues

Stakeholders assumed that the data repository would be created and managed by AHRQ or a contractor of AHRQ (as opposed to a private entity). The repository will need sound, transparent governance policies and procedures, and the governing body should include a broad group of stakeholders. A data access committee will be necessary to review and approve requests to use repository data. Various approaches to data ownership are possible, depending on the available resources and goals of the repository. The repository should be as inclusive as possible when accepting donated databases and should store data indefinitely, resources permitting. Incentives for donating data to the repository may include mandates from funding agencies, liability considerations, or priority access to repository data.

Patient Privacy

Protection of patient privacy is an important consideration for the repository of expired registries. Some registries collect direct patient identifiers, such as medical record numbers or patient names and contact information. Other registries may not collect direct identifiers, but may collect data, such as date of birth or dates of procedures, that could potentially be used to re-identify an individual patient. The HIPAA Privacy Rule protects these types of data. In agreeing to participate in a registry, patients may sign informed consents that allow the registry to collect and store personally identifiable information. However, the permission given in the informed consent may not be sufficient to allow the registry sponsor to donate that identifiable information to a data repository. Privacy concerns related to how the data in registries were gathered and stored may also exist. For example, were appropriate patient consents obtained? Were the data properly de-identified? Were sufficient privacy and security controls used (e.g., “old” registries may not have implemented proper procedures to ensure privacy and security of data)? The stakeholder discussions and background research on patient privacy focused on questions related to determining if informed consents were sufficient for storing identifiers, how the potential for re-identification through data linkage would be assessed, and the role of institutional certification. The legal issues related to patient privacy are discussed in the “Legal Issues in Creating a Repository of Expired Registries” section.

Storage of Patient Identifiers

Stakeholders concluded that identifiers could be stored in the repository if the informed consent allowed for this use. Discussion focused on how to determine if the informed consent was sufficient. Suggestions included storing original consents, storing model consents for studies where multiple consents were used, or tracking major changes to data collection and consents (e.g., specimens were collected from 2004 to 2006 only). These approaches may work for new registries that consider the possibility of donating their data during the design and operational phase. For new registries, the repository could develop suggested language for informed consent documents to facilitate future donations of data. However, stakeholders commented that these options could be difficult for some existing registries, particularly those that have collected data for long periods. The possibility of contacting patients from completed studies to obtain consent was also mentioned, although this approach represents a large burden in terms of costs and resources and may be infeasible for registries that did not maintain contact information.

Stakeholders also noted that reviewing consent forms represents a large administrative burden for the repository. A possible solution to this issue is discussed in the “Institutional Certification” section.

In terms of storing identifiers, stakeholders noted that additional security and technical requirements may add cost. Stakeholders also expressed concerns about data in the repository being subject to the Freedom of Information Act. The “Legal Issues in Creating a Repository of Expired Registries” section addresses this concern.

Potential Reidentification of Patients

Even if identifiers are not stored in the repository, data linkage projects using repository data could inadvertently re-identify patients. Stakeholders agreed that researchers may not fully understand the risks of re-identification when considering linkage projects and discussed whether expert review of the research project methods and/or results would be necessary. While this review could reduce the risk of re-identification, it would require additional resources. Other groups that provide data for linkage, such as CMS, do not review methods or results. Stakeholders concluded that a strong data use agreement would be necessary, and, if resources allowed, additional review of the methods and final product should be considered.

Institutional Certification

Stakeholders expressed concerns about whether the repository would be responsible for determining if the informed consent was sufficient for donating data to the repository. One suggestion was to rely on institutions (including the IRB or Privacy Board as applicable) to certify data prior to the data being accepted into the repository. With such “institutional certification,” the entity donating the data must provide certification that the use of the data in the repository is appropriate given the informed consents from individual patients (or the waiver of informed consent) that underlay the data.39

A model for the institutional certification process is the NIH GWAS policy, which applies to donations of biospecimens from NIH-funded genome-wide studies. The NIH policy on GWAS Data Submission Certification states:

"The NIH will accept GWAS data into the NIH GWAS data repository after receiving appropriate certification by the responsible Institutional Official(s) of the submitting institution that they approve submission to the NIH GWAS data repository. The certification should assure that:

  • The data submission is consistent with all applicable laws and regulations, as well as institutional policies;
  • The appropriate research uses of the data and the uses that are specifically excluded by the informed consent documents are delineated;
  • The identities of research participants will not be disclosed to the NIH GWAS data repository; and
  • An IRB and/or Privacy Board, as applicable, reviewed and verified that:
  • The submission of data to the NIH GWAS data repository and subsequent sharing for research purposes are consistent with the informed consent of study participants from whom the data were obtained;
  • The investigator’s plan for de-identifying datasets is consistent with the standards outlined in the policy;
  • It has considered the risks to individuals, their families, and groups or populations associated with data submitted to the NIH GWAS data repository; and
  • The genotype and phenotype data to be submitted were collected in a manner consistent with 45 C.F.R. Part 46."27

The GWAS policy describes specific points that IRBs should consider when reviewing informed consents to make a determination on certifying data for GWAS submission. For example, the IRB should consider whether the consent forms have any restrictions, such as types of subsequent research using the data or location of such research, and whether the study involves children or vulnerable populations.

The GWAS policy on certification could be adapted to meet the needs of the repository of expired registries. In general, certification requires the IRB to review the consent forms under which the data were collected over the duration of the study. The three possible outcomes of the certification process are (1) the data are appropriate for widespread research and can be donated to the repository; (2) the data are not appropriate for widespread research use and cannot be donated to the repository; and (3) the data may be used for research with appropriate restrictions. These restrictions could, for example, include a requirement that the data be fully de-identified or only be used in support of certain types of research (e.g., cancer research). The result of the certification process is a letter stating the IRB’s findings and detailing any restrictions, which the investigator submits to the repository along with the data.

The same process of certification would apply to registries that have a waiver of informed consent. The IRB would approve an additional waiver of consent to allow the data to be used in the repository. The primary concerns in approving such a waiver would likely relate to the nature of the data. For example, sensitive data such as those related to infertility treatments may not be approved for donation, while less sensitive information such as data on controlling blood pressure in hospitals would most likely be approved. An open question relates to whether a multisite study would need to have institutional certification from each individual participating site; it may be possible for the data coordinating center (or entity holding the data) to seek certification from its IRB for the entire study based on a model consent form.39

Because of the broad scope of the GWAS policy, it is likely that many IRBs, particularly at large research institutions, are already familiar with the certification process. In order to use a certification process for the repository of expired registries, the repository would need to develop a certification policy, similar to the GWAS example, that outlines the key points that certification must address. In addition, the policy should describe the broad public health benefits of submitting data to the repository to assure IRBs that the repository is a public health resource and that the data would be used to improve public health. This is an important point for the IRB to consider, as it is charged with balancing individual rights and welfare with broad social goals.

In developing a certification policy, the repository would also need to consider whether to accept data that have restrictions on their use. Restrictions place an administrative burden on the repository, which must then ensure that data use requests comply with the conditions. The repository would need to determine if the data were still valuable and should be included, even with the restrictions. In addition, the repository would need to consider whether the burden of institutional certification is too high for researchers donating data. Registries that are designed with data donation in mind may be able to assemble the necessary information with little effort. However, researchers with older registries may find the process of certification, which requires the collection and submission of a potentially large amount of documentation, to be a barrier to donating data.

On a related note, existing registries may need to review their IRB applications and obtain re-review for those in which data retention plans (e.g., destruction dates or policies) were included, since donation to a repository of expired registries would extend the time to destruction, possibly indefinitely.

Summary of Patient Privacy Issues

Patient identifiers can be stored in the repository if the informed consents supporting the original data collection allowed for this use. An institutional certification process could be used to determine if the informed consents are sufficient for storing patient identifiers and for donating the data generally. The repository would need to develop a certification policy and require researchers donating data to receive institutional certification from their institution prior to donating the data. The certification would state whether identifiers could be donated and for what purposes the donated data could be used. Regardless of whether identifiers are included, some data linkage projects using repository data may raise the possibility of inadvertent re-identification of patients. A strong data use agreement would be necessary for the repository, and, if resources allowed, additional review of the methods and final product of data linkage projects should be considered to prevent accidental re-identification of patients.

Research Ethics

Related to questions about informed consent and patient privacy are questions on research ethics. The repository will need to address several research ethics issues. For example, how will the repository ensure that the data are being used for valid research purposes? How will the repository determine if any ethical issues exist with the registry data (e.g., were appropriate patient consents obtained? Were the data properly de-identified? Were sufficient privacy and security controls used?)? The stakeholder discussions and background research focused on the use of data for valid research projects and identifying ethical issues with donated data.

Use of Data for Valid Research Projects

Stakeholders agreed that requests to use repository data should be accompanied by IRB approval (or an IRB determination that the research is exempt from such approval) for the proposed research project. This approach is consistent with data use procedures for the NIDDK, NHLBI, and NDAR repositories. Stakeholders also commented that additional review on the part of the repository may be necessary to determine if the use of the data represents valid scientific research. IRB approval is primarily focused on protecting human subjects, but it does not address whether the proposed research is scientifically valid or sound, particularly considering the limitations of the proposed data source. A repository steering committee or governing body could be used to review the requests and issue approvals. Should such a committee or governing body be used, stakeholders suggested that the originators of the data collection should not have a role in determining whether access to the data should be granted. Stakeholders concluded that the necessity of having governing body approval depends on the scale and scope of the repository. The repository should have a governing body that reviews data use requests for scientific validity if the repository is providing research support, such as formatting the data, assembling documentation on the data, and assisting researchers with finding appropriate datasets. These additional services add value to the data and, in turn, give the repository more rights to review the request, review papers being published using repository data, and be acknowledged in the publication. If the repository limits flexibility in how data are received and is only charged with archiving and providing data in accordance with data use restrictions, then the additional review for scientific merit may not be necessary.

Identifying Ethical Issues With Donated Data

The institutional certification process discussed above is also a potential approach for identifying ethical issues with donated data. In particular, certification could identify issues with insufficient consents or privacy protections. Stakeholders also recommended that the repository confirm that data that are supposed to be de-identified are, in fact, de-identified, noting that while the concept of de-identification is well understood, the technical requirements are not. Stakeholders suggested that a list of registry publications be included with the registry data, but emphasized that the archive should not give preferential treatment to registries that have published their findings. Some registries might have important negative findings (i.e., finding of no effect of a certain treatment) that may have been difficult to publish. Data requestors should also submit at least some biographical information to ensure that they are part of a legitimate research organization and have not been subject to disciplinary action or named in lawsuits related to their research.

The ethics of extending the research purpose for which data were obtained originally must also be considered. Among some potential applications that may not have been considered when a registry was created are (1) could the data be used for commercial purposes (e.g., to understand the background rates of certain conditions that might be meaningful to guide drug development); and (2) could data be obtained, after appropriate review, to be used in support of litigation (e.g., to evaluate the prevalence of certain conditions when trying to evaluate the causality of a purported event?).

Summary of Research Ethics Issues

Requests for data access should be accompanied, at minimum, by IRB approval (or an IRB determination that the research is exempt from such approval) for the proposed research project, a description of the project, and information on the researcher. Depending on the available resources, the repository may consider using a governing body to review requests for scientific validity and merit before releasing data. The institutional certification process for donated data could be used to identify ethical issues with donated data or determine restrictions on use of the data.

Data Access and Use

Data access and use policies and procedures will be critical for enabling the repository to meet its goal of supporting future research projects. Some data in the repository may be able to support many types of research projects, while other data may be able to support a more limited range of studies; this will likely depend on the conditions under which the data were originally collected. Information on the data will also need to be maintained and made available to researchers to ensure that researchers understand the strengths and limitations of the data and can draw appropriate conclusions from their findings. For example, researchers will need to understand if the data were audited, how the data were collected, how the data elements were defined, and how patients and sites were selected and enrolled. These factors will help researchers to assess the data quality and the potential for selection bias. The stakeholder discussions and background research focused on conditions for data access, storage of informed consent information, and storage of data documentation.

Conditions for Data Access

Stakeholders agreed that a clear, transparent data access process is necessary. The process may begin with a data use request, which describes the proposed use of the repository data. The repository would review the data use request and either approve or deny it. The level of review would depend on the type of services that the repository is providing, as noted above. For approved requests, the requestor would sign a data use agreement and return it to the repository, which in turn would release the data. Stakeholders referenced the model for accessing CMS data as an example and specifically suggested that the CMS data use agreement may be a useful model for the repository.a It was also noted that requestors should not assume any liability regarding consent and/or the results of analyses.

Storage of Informed Consent Information

The informed consents (or waiver of informed consent) under which the data were originally collected are important to determining how the data can be used for future research projects. Stakeholders discussed the certification process described above as one approach to determining for which research purposes the repository data can be used. This process would allow the repository to store the IRB’s findings on appropriate uses of the data, rather than storing and examining all of the informed consents. However, this model transfers some burden onto the researcher donating the data. In this discussion, stakeholders also returned to the question of including identifiable data and discussed whether registries that could not donate identifiable data could anonymize (i.e., de-identify) the data. This raised questions on the technical requirements for anonymizing data. Stakeholders questioned whether identifiers were necessary at all, but concluded that identifiers could be important for linkage studies and that the repository would need to review data use requests to ensure that identifiers were truly necessary for a project before releasing the identifiers. Whether the registries could anonymize the data prior to donation also would require review of the underlying consent and data use documents to determine if anonymization was an approved use of the data.

Storage of Data Documentation

Stakeholders agreed that information on the original study and how the data were collected are critical to ensuring that researchers use the data appropriately in new research projects. This conclusion is consistent with the approaches used by the NIDDK, NHLBI, and NDAR repositories. In the NIDDK repository, the repository staff replicates tables from the primary paper based on study data to ensure that the data in the repository are the same as what were used to publish the results. Depending on the resources available, stakeholders recommended that detailed data use guides be developed for the datasets and updated periodically to incorporate lessons learned. Stakeholders also discussed whether it is reasonable to store data that were not subject to quality control measures and agreed that these datasets should be included. Information should be available in the repository as to whether the data were subject to quality measures and, if so, what measures.

Summary of Data Access and Use Issues

A clear, transparent, documented data access process is necessary. The process should, at a minimum, include a formal data request, review of the request, and completion of a data use agreement. The repository should store information on the original study and how the data were collected to ensure that researchers can use the data and draw valid conclusions. Depending on the resources available, this information may include case report forms (CRFs) and a protocol or study plan submitted with the donated data, or the repository could develop more comprehensive data use guides. An important point to include in dataset information is what quality control measures, if any, were used during the registry.

Technical Considerations

The repository will need to store donated databases in a manner that allows appropriate access in accordance with the policies and procedures of the repository. Some of the technical challenges to be faced include the implementation of an appropriate process and use of technologies and standards that allow for the likelihood of receiving a diverse range of databases that make use of different database vendor technologies, metadata, and terminologies. In addition, storing registry data requires appropriate levels of security measures based on the types of data contained in the database. The implementation of security technologies will allow adequate access and availability to the data while preventing unauthorized use. The stakeholder discussions and background research in this area focused on data storage and security, metadata, technology standards, and transfer procedures.

Data Storage and Security

Stakeholders clarified that the data storage and security requirements for the repository are different from those imposed upon individual users of the data (data recipients). For example, the repository would need to comply with National Institute of Standards and Technology (NIST) standards, but individual users of the data may not need to comply with those requirements. Stakeholders suggested that individual users of the data be held to a “reasonable” standard, such as the requirements for using CMS data. The data use agreement could include language on this point, such as, “reasonable confidentiality and security measures will be used.” However, it is unlikely that the repository would be able to audit or otherwise police or enforce these requirements.

Metadata

Some level of metadata will need to be available so that users of the repository can find and request access to data. Stakeholders suggested that, in a limited resources model, the only available metadata would be the registry listing in the RoPR system. This could be supplemented with additional documents, such as the CRFs and protocol, provided that the researcher submitted this information. If more resources are available, stakeholders suggested including additional information about the data, either compiled by the researcher or by the repository. These may include the inclusion/exclusion criteria, availability of identifiers, description of how and why the data were collected, bibliography, data format, percent missing for key variables, and summary analytics. Change management information may also be useful to include, particularly if the registry underwent major changes at any point.

Technology Standards

Stakeholders commented that the repository could invest significant resources in formatting data to match technology standards. If resources are limited, the repository could make no effort in this area and preserve the original format of the data. If more resources are available, the repository could modify datasets for consistency in format and standards. For example, the NIDDK repository provides all datasets in SAS format. The NDAR project aggregates data across studies by mapping common data elements. Both of these options require significant resources. Alternately, the repository may make a statement of preferred standards, but accept data in any format.

Transfer Procedures

When a data request is approved, the data will need to be provided to the requestor. The data could be transferred to the requestor (e.g., mailed on a CD-ROM) or maintained through a data enclave. Stakeholders commented that the transfer procedures are less important for de-identified data. For example, these data may be sent on a CD-ROM with a requirement that the requestor destroy the data when the project is complete. Access to identifiable data is more complex. In these cases, the repository may consider transferring the data through other means, such as a secure FTP transfer. To prevent unauthorized use of the data, stakeholders suggested that the ramifications of breaking the data access rules should be in the agreement (e.g., no access to repository in the future, no future grants from AHRQ). A data enclave, such as the model used in NDAR, would provide greater control but would require more significant resources, including developing and maintaining the enclave and preparing data for inclusion

Summary of Technical Considerations

While the repository itself would be held to NIST standards, individual researchers using the data could be held to a “reasonable” standard for data storage and security. Some level of metadata will need to be available so that users of the repository can find and request access to data. At a basic level, metadata could include only the registry listing in the RoPR system. Available resources would also drive technology standards and transfer procedures. If sufficient resources were available, the data could be converted to a standard format; otherwise, the data could be stored in their original formats. Data transfers could be accomplished through shipping durable media or using more sophisticated, secure electronic interface.

Summary of Stakeholder Perspectives

Stakeholders attending the in-person meeting concluded that the creation of a repository of expired registries is feasible from an ethical, operational, and technical perspective. There is interest in the research community in developing a repository of expired registries to facilitate future research, but the value of the repository would depend on the data that were donated.

Legal Issues in Creating a Repository of Expired Registries

Refer to Footnote B. for addtional information about this section.

The question of whether a data repository consisting of expired registriesc can lawfully be created and used raises numerous legal questions under both state and Federal law. At the same time, numerous examples of such repositories exist, clarifying the legality of such arrangements.

At its heart, the law is about the rules of conduct that govern relationships. Depending on the relationship to be created, applicable laws can vary. In general, the creation of a data repository will involve relationships among a number of parties: the developer(s)/owner(s) of the repository; the owners of the information whose inclusion is sought; and parties with an ongoing legal interest in how the information is collected, stored, and made available to third parties. It is these relationships that must be accounted for in repository creation and maintenance.

In cases in which the parties consist of a privately supported repository linked to registries created with private funding, ownership of the repository and the information sources may be private, but at the same time other parties (i.e., the individuals whose information is being held and used) will have a legal interest in the enterprise. Thus, the contractual terms that spell out ownership, access, and use rights also will need to account for the legal interests of third parties under state and Federal law, in particular laws that apply to health information privacy and confidentiality. A similar result would pertain when a repository is created and managed under state law; that is, state law would govern the establishment and operation of the repository and would define the interests of participants. Similarly, the state’s repository enterprise would have to assure that interests and rights created under Federal laws also are protected.

This analysis focuses on the legal issues that pertain to situations in which the Federal government authorizes and supports the creation of a repository, whether maintained directly by the Federal government or through a private contractor, and where the repository holds the content of registries that in turn have been created as part of one or more federally supported project awards. This analysis uses as an example an AHRQ-sponsored repository (termed here a federally sponsored repository) that holds the contents of registries created through AHRQ-sponsored project awards, although the Federal legal principles that come into play in such an example would arise regardless of which Federal granting agency (e.g., AHRQ, Centers for Disease Control and Prevention [CDC], NIH, CMS) funded the registries that in turn feed a repository.

In creating a federally sponsored repository, two distinct situations can arise. In the first situation, the repository, along with the registry data it holds, has been created with the express assumption (and agreement on the part of individual registry holders) that the contents of individual registries would be linked in the future to the repository. In the second situation, the repository is created after the fact, that is, after numerous individual federally supported registries have been created but not in anticipation of ultimate linkage to a repository. Put another way, in one case the repository is prospective and anticipatory, while in another, the repository is created after registries authorized by Federal project awards have been established and are operating. In truth, of course, both situations may arise, since while repositories may put anticipatory agreements into place going forward, they inevitably will desire to link retrospectively to registries that pre-dated their existence. Thus, any repository project will need legal capabilities to look forward and backward in relation to current and future registries included or to be included therein.

Beyond the timing issues associated with the relationship between a repository and individual registries, other legal issues will arise. For example, legal considerations might change depending on whether a repository seeks to hold identifiable, as opposed to non-identifiable, patient information. Similarly, legal issues will depend on the types of uses that the central repository seeks to permit. Who will have access to repository information and for what purposes? Will access be to identifiable information or non-identifiable data? Who will have the power to set the rules for data access and use, and how will the repository be governed?

The analysis that follows focuses on the following areas of law:

  • Federal laws related to agency access to and oversight of federally supported projects and the information produced under such projects
  • HIPAA Privacy, Security, and Breach Notification Rules
  • The Common Rule
  • “Part 2” regulations related to alcohol and substance abuse treatment information
  • The Privacy Act of 1974
  • Federal Information Security Management Act (FISMA)
  • Federal laws related to information collection (e.g., the Paperwork Reduction Act)
  • Freedom of Information Act (FOIA)
  • State confidentiality laws

Federal Laws

U.S. Department of Health and Human Services (HHS) Regulations Governing Uniform Administration of Federal Awards

Federal regulations establish certain “Uniform Administrative Requirements for Awards and Subawards to Institutions of Higher Education, Hospitals, and Other Nonprofit Organizations and Commercial Organizations.”40 These regulations, specifically 45 C.F.R. 74.36(c), state that:

The Federal Government has the right to: (1) Obtain, reproduce, publish or otherwise use the data first produced under an award; and (2) Authorize others to receive, reproduce, publish, or otherwise use such data for Federal purposes.

Where Federal funding is awarded with the express assumption that the award will support, among other activities, the creation of a patient registry, the Federal government presumably would retain a right of access to and use of the data and would further have the power to allow others to use the data. However, where the creation of a registry is undertaken as an activity for the recipient’s own use and not one specified or funded under the award, the Federal interest in the data may be less clear, even though the entity is a federally funded recipient. Thus, to the extent that the government desires to have access to and use of patient registry data, this regulation suggests that the relationship between registries created and the award should be made an explicit aspect of the terms and conditions that attach to the award. Put another way, if creation and maintenance of a registry is a condition of the grant award, then the Federal interest is preserved.

This rule does not obviate the need to assure compliance with other applicable laws. However, it does clarify that the Federal government has a legal interest in registry data (for its own use and that of others) when created as part of federally supported projects whose terms express the Federal interest in advance and make authorized data sharing with the Federal government and legally authorized third parties a condition of the award.

HIPAA

The HIPAA Privacy Rule

The Privacy Rule applies to covered health care entities, which include health plans, health care clearinghouses, and health care providers who conduct certain electronic health care transactions.41 The Federal government, including AHRQ, is not a HIPAA-covered entity and therefore HIPAA does not apply unless HIPAA requirements are included in the terms of an agreement. However, a private contractor operating under a project award related to a data repository could be covered (or could be the business associate of another covered entity) to the extent that it is authorized to obtain and store data submitted by a Federal award recipient. In this situation the repository contractor might be a “health care clearinghouse” and thus, a covered entity, since its function is to “process[] or facilitate[] the processing of health information received from another entity in a nonstandard format or containing nonstandard data content into standard data elements.”41

A private contractor also might be considered a business associate of a covered entity, because its task is to perform or assist in the performance of “a function or activity involving the use or disclosure of individually identifiable health information.”42 This would be especially true in cases in which, as part of an award, the Federal government has both specified the creation of a registry and has conditioned the award, in advance, on the willingness of the recipient (assuming the recipient is a covered entity) to make its registry data available to a central repository contractor. Even if the recipient of the award or the repository contractor are not covered entities or business associates for purposes of HIPAA, it is likely that as downstream recipients of data (e.g., protected individually identifiable health information) from covered entities (providers or health plans), they will be bound contractually to comply with the HIPAA Privacy Rule requirements.

The Privacy Rule’s purpose is to protect individually identifiable health information (also referred to as protected health information or PHI) used by covered entities and their business associates. For this reason, the Rule requires explicit individual patient authorization or an opportunity for the individual to object to the use or disclosure of PHI unless release is otherwise permitted by the Privacy Rule.43 Specifically the Privacy Rule requires individual patient authorization, among other cases, in the following instances: (1) for any use or disclosure of psychotherapy (with exception for treatment, payment or health care operations)44 and (2) for marketing purposes (with some exceptions for face-to-face communication or gift promotions of nominal value).45 Moreover, the HITECH Act adds a new provision to the Privacy Rule that prohibits, with several exceptions,46 covered entities from selling protected health information without authorization. A valid authorization must describe how the information will be used or disclosed, the purpose of use or disclosure, the expiration date for the authorization, the individual’s right to revoke consent, and the individual’s actual consent to use the information.47

Unless the activity falls within one of the exceptions to this basic requirement, the Privacy Rule would apply to any relationships involving the exchange of PHI, such as identifiable registry data, between covered entities or a covered entity and its business associate, regardless of whether the disclosure is in furtherance a Federal award that explicitly provides for the provision of data to a repository, a subsequent activity not contemplated under the original Federal award, or an activity between two wholly private actors.

Furthermore, because registries may exist over long periods, it is important to note that the Privacy Rule protects a deceased patient’s PHI in the same manner and to the same extent as it would a living individual’s PHI.48 Currently, this protection is afforded to the deceased’s PHI for as long as the covered entity maintains the information with certain exceptions.d However, the Federal government has proposed significant changes to this protection in a July 2010 Notice of Proposed Rulemaking. Under the proposed rule, the definition of PHI would be amended to exclude information of persons who have been deceased for more than 50 years.49 Thus, the Privacy Rule would not protect the PHI of these deceased persons. It is unclear when and whether the Federal government will finalize this proposed change.

The Privacy Rule does permit a covered entity to disclose protected health information without individual consent for certain specified purposes, the relevant of which are noted here:

  • Where the disclosure is for treatment, payment, and health care operations;e
  • Where the disclosure is “incident to” an otherwise permitted or required disclosure and the disclosure is the minimum necessary to accomplish the disclosure’s intended purpose;f
  • Where an individual has had an opportunity to agree to, or object to, disclosure;50
  • Where a disclosure is required by law (information disclosed must be limited to what is required);g
  • Where the disclosure is for research (regardless of the source of funding for research) provided that an IRB or Privacy Board has approved an alteration or waiver of the basic authorization requirement, and provided further, that the purpose of the disclosure is research preparation purposes and, importantly, that none of the information will “be removed from the covered entity;”h Where disclosure for research involves a decedent’s PHI, a covered entity must obtain assurances that the use or disclosure is solely for research on the PHI of decedents and be provided documentation of the death of the individual about whom information is sought and that the PHI is necessary for the research purposes.51 Disclosures of decedent’s PHI for research purposes do not require prior IRB or privacy board approval, or authorization by a personal representative.
  • Where the data release is restricted to LDS that do contain certain specific patient identifiers, release is possible as long as the individuals who will use the data enter into a data use agreement (DUA);52
  • Where the