Skip to main content
Effective Health Care Program

Medicare Prescription Drug Data Development: Methods for Improving Patient Safety and Pharmacovigilance Using Observational Data

Research Report

Author Affiliations

Brian C. Sauer, Ph.D.1, 2

Judith A. Shinogle, Ph.D., M.Sc.3

Wu Xu, Ph.D.4

Matthew Samore, M.D.1, 2

Jonathan Nebeker, M.S., M.D.1, 2, 5

Zhiwei Liu, M.S.4

Rand Rupper, M.D., M.P.H.1, 2

Linda Lux, M.P.H.6

Jacqueline Amoozegar, B.A.6

Kathleen N. Lohr, Ph.D.6

1 University of Utah Department of Internal Medicine, Salt Lake City, UT.

2 VA Salt Lake City Informatics Decision Enhancement And Surveillance (IDEAS) Center, Salt Lake City, UT.

3 School of Public Health, University of Maryland, College Park, MD; formerly RTI International, Research Triangle Park, NC.

4 Office of Public Health Informatics, Utah Department of Health, Salt Lake City, UT.

5 VA Salt Lake City Geriatrics Research Education and Clinical Center (GRECC), Salt Lake City, UT.

6 RTI International, Research Triangle Park, NC.


Background: The Medicare Prescription Drug, Improvement and Modernization Act of 2003 introduced the Part D benefit for outpatient medications for Medicare. Anticipated increases in use of prescription drugs, coupled with the concern for drug safety, fuels the need for drug safety data beyond those from randomized controlled trials or voluntary reporting schemes.

Objective: To improve methods for using claims data to examine patient safety and pharmacovigilance issues, we developed a data analytic framework and methods for pharmacoepidemiologic research on adverse drug events (ADEs) using population-based claims and administrative data sources. We tested our framework and methods by performing pilot analyses using drugs for dementia, including Alzheimer's disease, as the illustrative case.

Design: We used an open cohort design with data structured in a longitudinal format to measure exposure accurately. We adjusted for confounding using logistic regression and for treatment selection using inverse probability weights.

Setting: Because Medicare prescription drug claims are not yet available, we used pharmacy and medical claims and death records from the State of Utah Medicaid programs.

Patients: Medicaid patients had to be ages 50 and older, be identified in the Medicaid enrollment table, and have at least one pharmacy or medical claim recorded between January 1, 2003, and December 31, 2005.

Measurements: We reconstructed patients' drug regimens and established therapeutic course through drug claims. We measured ADEs through the medical claims and death records for three types of outcomes: death, expected adverse events (gastrointestinal and psychological disorders), and novel events that are rare but serious events (hematological and hepatic disorders).

Results: We were able to develop a database that allowed us to characterize drug exposure and evaluate the association between drug exposure and three types of adverse drug effects; these included death, expected events, and idiosyncratic events. Analysis of early versus late exposure within the treated cohort demonstrated a highly significant early risk for episodes of care for hematological diagnoses (incidence rate ratio, 2.86; 95 percent confidence interval, 1.6-5.11).

Conclusions: Researchers can easily apply our framework for working with observational data, particularly pharmacoepidemiologic databases; they can readily adopt or adapt our methods and stepwise approach (i.e., data integrity, exposure and persistence, and ADE analysis). Data from Medicaid, employer, insurer, and (eventually) Medicare claims can be used to examine specific drug classes and individual drugs for known and unknown ADEs. The ADE framework of initially examining mortality, expected events, and then novel reactions that are potentially severe but unlikely events will foster understanding of drug safety and generate hypotheses for future investigations. The clinical findings concerning antidementia drugs, because of their limited nature (e.g., one state, relatively small numbers of ADEs), should be used for generating hypotheses and signals for further investigation, not for clinical decisionmaking.

Key words: adverse drug events, Alzheimer's disease, dementia, drug safety, methodology, pharmaceuticals, pharmacoepidemiology, pharmacovigilance.


Background for This Study

Pharmaceuticals, like all health care interventions, offer benefits to patients but also pose risks of harms in the form of negative side effects and adverse events. The US Food and Drug Administration (FDA), in regulating drugs for the US marketplace, relies in part on safety data generated by randomized controlled trials (RCTs). The limitations of these data are widely recognized because of the characteristics of such trials (e.g., highly selected settings and patient populations, short duration of studies, less reporting of such information than of positive outcomes). Other sources of information, including various types of observational studies, voluntary schemes for reporting adverse events, and more organized postmarketing surveillance studies, contribute to the knowledge base about drug safety and tolerability.

Recent reports underscore the need for such methods, particularly to detect serious but rare adverse events that were undiscovered during premarketing trials. For example, cyclooxygenase-2 inhibitor nonsteroidal anti-inflammatory drugs (commonly known as COX-2 inhibitors, and used for pain management) are documented to increase cardiac morbidity;1,2 antipsychotic medications (especially atypical antipsychotics) are associated with an increased risk of mortality in the elderly.3 These types of findings also prompt questions about morbidity and mortality in elderly or frail individuals who are exposed to other classes of drugs.

Cutting across these concerns is the enactment of the Medicare Prescription Drug, Improvement and Modernization Act of 2003 (MMA), which introduced the Part D benefit for outpatient medications for Medicare beneficiaries. This represents arguably the largest expansion in Medicare benefits since the program's inception in 1965. In 2006, an estimated 43 million Medicare beneficiaries were eligible for "credible" prescription drug coverage under Part D, through either Part D drug plan coverage or employer or union retiree drug coverage that qualifies for the Medicare retiree drug subsidy. The US Department of Health and Human Services (DHHS) Center for Medicare & Medicaid Services (CMS) stated that as of June 11, 2006, 38.2 million Medicare beneficiaries have such coverage (, accessed for this purpose August 9, 2006).

The use of prescription drugs has grown dramatically with the advent of increased insurance coverage and greater numbers of products available. Consequently, prescription drug expenditures grew from $5.5 billion in 1970 to $179.2 billion in 2003.4 Growth in the number of elderly having access to drug coverage will likely increase utilization in this population.

Section 1013 of MMA called on the Agency for Healthcare Research and Quality (AHRQ) to create mechanisms to address these types of issues. AHRQ in turn created its Effective Health Program (details available at, which includes some work by its long-standing RTI International (RTI)-University of North Carolina (UNC) Evidence-based Practice Center (EPC) program and a new initiative, DEcIDE (Developing Evidence to Inform Decisions about Effectiveness). DEcIDE centers will generate new scientific information through research on the outcomes of health care services and therapies, including drugs; the centers give initial emphasis to Medicare beneficiaries and older adults and to 10 priority conditions (arthritis and nontraumatic joint disorders, cancer, chronic obstructive pulmonary disease and asthma, dementia including Alzheimer's disease, depression and other mood disorders, diabetes mellitus, ischemic heart disease, peptic ulcer disease and dyspepsia, pneumonia, and stroke and hypertension).

Of particular concern are harms (adverse events) that may befall medication users. An adverse event is any injury caused by medical care or treatment. The FDA defines adverse events as "any incident where the use of a medication (drug or biologic, including HCT/P [human cells, tissues, and cellular and tissue-based products]), at any dose, a medical device (including in vitro diagnostics) or a special nutritional product (e.g., dietary supplement, infant formula, or medical food) is suspected to have resulted in an adverse outcome in a patient" ( ). The FDA voluntary reporting system limits submissions to serious events, such as death, life-threatening hospitalizations, disability or permanent damage, congenital anomalies or birth defects, medical or surgical interventions required to prevent permanent impairment or damage, and other serious medical events.

An adverse drug event (ADE) is an injury involving medication use. Thus, ADEs include expected adverse drug reactions (or side effects), drug-drug interactions, unexpected reactions, events that can be attributed to errors of various sorts, and even patient nonadherence to medication regimens.5

Recently, the overall rate of adverse events has been estimated to be 50.1 per 1,000 person-years.6 Preventable ADEs account for 13.8 per 1,000 person-years of overall adverse events. Every ADE increases costs by $1,000, and preventable ADEs increase costs by $2,000.6 Pezalla put the number of patient deaths attributable to ADEs in 2000 at about 218,000 and noted that more than half of approved drugs on the market may pose the risk of serious side effects that had not been identified before FDA approval.7

As noted, scientific evidence available for the efficacy, effectiveness, safety, and tolerability of drugs is often limited to RCT results, which are often not generalizable (i.e., applicable to broad population groups). Thus, AHRQ EPC efforts under MMA 1013 will expand the knowledge base in this area, as it calls for the development of comparative effectiveness reviews (i.e., systematic reviews of published literature that focus on comparisons of pharmaceutical agents across or within drug classes). These reviews can and do examine observational studies as well as both head-to-head and placebo-controlled trials (which respectively provide direct and indirect comparative evidence).

The increased use of drugs (generally and within the elderly population) also affords an opportunity to examine adverse events in large administrative claims databases through linking prescription drug claims with medical claims. Because Medicare beneficiaries often have a complex array of health issues managed by multiple medications, this population is at risk for complications resulting from drug safety issues. The presumed availability of information from CMS, including databases that may combine Part A, Part B, and Part D information, is expected to provide a unique opportunity to study how prescription drugs are used in this population, the positive and negative effects of prescription use, and the outcomes of such use.

AHRQ commissioned the RTI DEcIDE Center to take on a specific project to develop a framework and methods for identifying ADEs in claims databases that could mimic those eventually presumed to be available on Medicare beneficiaries. In principle, Medicare pharmacy and claims databases will be ideal for large postmarketing surveillance studies of ADEs. In practical terms, databases that include outpatient pharmacy data are not yet available from CMS. For that reason, AHRQ assigned us the task of exploring how best to use similar databases and to develop and test methods and measures for studying medication safety in the elderly. We developed and applied a strategy that allowed us to study both known and reported adverse events and to search for novel reactions.

To apply methods and test measures appropriately, research must examine the application of measures before implementing them nationally. In proceeding this way, AHRQ aims to offer new resources and tools for numerous stakeholders, including those in pharmacoepidemiology and pharmacoeconomics, for studying and understanding the use, benefits, and risks of pharmaceuticals, and to do this in advance of the appearance of Parts A, B, and D Medicare data. Our work contributes to this methodological toolbox and develops a data analytic framework for pharmacoepidemiologic research on ADEs using population-based claims and administrative data sources. The project also was charged with using antidementia drugs as an illustrative example for the methods developed for the study. Thus, we used this drug class as an example of how to implement the framework and methods, and not how to supply clinical information for direct provider or patient decisionmaking.

The next sections discuss the background for our specific study and for the methods we undertook to develop and document our research. Later sections describe the goals of our project and the organization of this report.

Detecting Adverse Drug Events

Information regarding ADEs is first generated from clinical trials used to obtain FDA approvals. Although RCTs are the gold standard for clinical research, they have several limitations in identifying potential adverse events in the population. In general, the subjects in such trials are healthier and have less comorbidity than the general population. These patients may not be as susceptible to ADEs as a more general population (for whom the drugs might well be prescribed). Another concern with identifying ADEs through RCTs is that ADEs, and certainly the more serious ones, tend to be rare events. Thus, the sample size with the necessary power to capture an ADE is often larger than would be typical for standard RCTs, because of logistical and cost reasons. Another concern with results from RCTs is they often focus on short-term efficacy and safety.

The second most common method to identify ADEs is through surveillance systems that rely on clinician-driven reporting of adverse events. Examples of such systems include FDA's MedWatch and the U.S. Pharmacopeia's MEDMARX® systems. These systems rely on reports from health professionals and patients. The data sets are continuously updated and periodically subjected to statistical or rule-based algorithm testing that, when "positive," leads to further investigation (i.e., they can be used for both screening for and confirming ADEs). One drawback, however, is that these approaches may not permit analysts to control for confounding or other problems (obviously, because the systems are voluntary, they will necessarily be incomplete). Another, more serious limitation of these systems is that they lack measures of exposure to these drugs. Not having any denominator for these surveillance systems inhibits users' ability to calculate ADE rates.

A third approach is the use of observational studies. Some experts argue that even though clinical trials are the benchmark for efficacy data, pragmatic trials are the most appropriate way to evaluate safety once drugs have reached the market.8 These studies actually examine how the drug is used in "real-world" practice. Pragmatic trials will have a randomized component (although it may be a cluster randomization by, for instance, practice groups); observational studies will not have a randomized component and may or may not have prospective control or comparison groups.

Typically, in many such studies, researchers identify a cohort that includes individuals who received the drug and compares their outcomes with those for persons who did not receive the drug. The outcomes are examined through statistical associations. Observational studies with restricted cohort designs that have baseline features for inclusion and exclusion criteria similar to those for clinical trials and that adjust for differences may not overestimate any effects.9 Common methods to control for baseline differences in cohorts include matching and statistical techniques such as propensity scoring or inverse probability weighting (IPW).5,10,11 We briefly describe these techniques below insofar as they are relevant for this study.

Database Design

One key issue in using claims data for ADE analysis concerns formatting the database into a framework that is functional and flexible. The format needs to allow investigators to create accurate measures of exposure to the drug of interest, determine accurate timing of the adverse event, and ascertain comorbidities. To achieve this structure, creating the database in a longitudinal format is important.

Longitudinal data have both an individual person component and a time component. Thus, each person will be in the database multiple times depending on the time frame for the study. For example, if the time framework is months, and 2 years of data are available, each person will be in the database 24 times (12 months times 2 years). This assumes that the person is in the study the entire time, which is neither required nor always observed. This person-time structure allows the most analytic flexibility and efficiency. In addition, analysts can use the person as his or her own control if the data have pre- and postexposure information.

Thus, this data structure allows detection and assessment of multiple events of different type (different target outcomes), multiple events of the same type (recurrent events), and time-dependent covariates. In addition, it accommodates an accurate measure of exposure to the drug of interest.

Causality Assessment in Observational Studies

Pharmacoepidemiology is an epidemiologic discipline in which drug treatment is the exposure of interest. Considerations of sample size, selection bias, misclassification, confounding, and causal inference are comparable with those in other epidemiologic fields.

A pervasive issue in pharmacoepidemiological studies is confounding by indication, a problem that arises because the factors that influence the treatment choices made by clinicians and patients also typically affect outcomes. Similar to other types of confounding, confounding by indication biases the crude association between exposure and disease away from the true causal effect. The five broad strategies for control of confounding in observational research are restriction, matching, stratification, standardization, and multivariable adjustment.12,13

Case-Control Matching, Stratification, and Standardization

Restriction and matching aim to remove confounding via study design. The goal is to either eliminate variability in the confounding factor or balance the distribution of measured confounders across exposed and unexposed groups. Restricting the study population may reduce the number of subjects to an unacceptably low level and may compromise generalizability. Matching potentially overcomes these problems, but it also becomes problematic when attempting to match on more than a few factors.14

Control of confounding also can be accomplished through analytic or statistical adjustment. Standardization and stratification are the simplest methods of adjustment. A covariate cannot be responsible for confounding within a stratum that is internally homogeneous with respect to that covariate.14 Adjusting for multiple confounders through stratification is problematic, however, because of the sparse-data problem. In addition, examining multiple outcomes in stratified analysis often is difficult.

The most common method for limiting sparse-data problems is to use multivariable regression models to determine the dependence of the outcome on the treatment and covariates. Multiple regression models adjust for confounding by holding potential causal determinants of outcome constant. The underlying argument is that, within levels of specific covariates, analysts should be able to determine the causal effect of the exposure of interest, because within those strata, the covariate values are the same for all individuals and thus cannot explain differences in the mean outcome levels according to the exposure.

Case-control design is another method to examine ADEs in claims data.15 Case-control studies select subjects by their disease status, which would be, for this purpose, the presence of the ADE. Although this can be an appropriate method for known events, it does not allow the generation of hypotheses regarding unknown but potentially serious events that ADE research attempts to discover.

Propensity Scores and Inverse Probability Weighting

More recently, hybrid methods have been developed to adjust for confounding by indication. They do so, in part, by conditioning on the probability of being exposed. Epidemiologists and health services researchers increasingly are becoming experienced with the use of propensity scores and IPW to address confounding in observational studies.13*

Propensity scores. The propensity score is the probability of treatment or exposure, conditional on observed covariates. Propensity scores are typically applied to a dichotomous exposure or treatment (rather than to other, nondichotomous variables such as counts or time). The goal of propensity scores is to decrease bias and increase precision through creating a balance in measured (i.e., observed) covariates across exposed and nonexposed individuals. The general method to create propensity scores is to develop a logistic regression model of exposure to the drug treatment versus no exposure to the drug treatment as a function of observable covariates and calculate a score for each observation from the predicted values of the model. Calculation of the propensity score typically decreases the dimensionality of the regression model of the outcome; in effect, it serves as a data reduction step. Whether used for purposes of matching or stratification or as a covariate, the propensity score simplifies the process of building a regression model of the outcome.

Several techniques exist to use these propensity scores to control for baseline differences. These techniques include various methods of matching, stratification, use of the propensity score as a covariate in a model of the outcome of interest (e.g., as in our study, an ADE), or combinations of these methods.

One limitation with propensity scores is that if the outcome is rare, propensity score models are difficult and may not be able to perform any better in balancing groups than multivariate models.16 Another drawback is that propensity scores and similar techniques (such as IPW) assume that no unmeasured confounding exists. If this unmeasured confounding does exist, the estimates resulting from these techniques will remain biased. That is, although propensity scores can balance exposed and unexposed groups on observed factors, they may not balance the two groups on unobserved factors. Analysts may not be able to predict, or determine, what factors are unbalanced or in which direction the bias exists.

Inverse probability weighting. A less familiar approach to remove confounding from measured variables is to use IPW estimators. In this method, analysts weight each person in the population by the inverse of the conditional probability of receiving the exposure that he or she in fact received. For example, in our study for the exposed subjects (i.e., patients who received the drug), the weight is the inverse probability of receiving the antidementia drug. For those subjects who were not exposed, the weight is the inverse probability of not receiving the drug.

Unlike conventional regression methods, IPW can adequately account for time-varying confounders that are also intermediate variables. Failure to include a time-varying confounder in a regression model of the outcome leaves residual confounding. However, inclusion of an intermediate variable produces bias toward the null. Therein is the dilemma, for which IPW is an elegant solution. In general, IPW is a more powerful strategy than propensity scores for controlling for confounding for time-varying treatments. The difficulty with IPW is that it is more complicated to incorporate into analyses than propensity scores alone. Chapter 2 explains our application of IPW in more detail.

Dementia, Pharmaceuticals, and Adverse Drug Events

Dementia, including Alzheimer's disease, is one of the current federal priority conditions for CMS and AHRQ work. Because this disorder primarily affects those who will be in the Medicare data and because newer drugs have come to the market for treatment of dementia, we were asked to use this set of disorders as the illustrative condition for our study. For clinical and epidemiologic data on this condition, we relied on a recent report from the Drug Effectiveness Review Program (DERP) completed by staff of the EPC17 and a literature review by staff of the University of Maryland at Baltimore DEcIDE Center.18

Alzheimer's disease is the most common form of adult dementia; it is characterized by slow onset and progression of cognitive impairment. Although impaired recent memory is the most significant symptom, other signs and symptoms include problems with language, so-called executive control functions (judgment), behavior, and emotional well-being. This progression is not considered a "normal" part of aging.

Alzheimer's disease affects an estimated 4.5 million people in the United States. It typically begins after age 50; the percentages of individuals with the condition rise sharply after age 65 and the risks rise with age. Between 30 percent and 50 percent of persons 85 and older may be affected (, accessed for this purpose July 30, 2006). About half of all Alzheimer's disease patients have only mild forms; the other half have moderate to severe disease. Given the aging of the US population, this condition can be expected to take on increasing significance and prevalence in coming decades.

Treatment for dementia has changed dramatically over the past decade with the introduction of pharmaceutical therapy. Two main classes of drug, approved by the FDA for this purpose, are used to treat dementia. The first class comprises several acetylcholinesterase (AChE) inhibitors: donepezil hydrochloride, rivastigmine tartrate, galantamine hydrobromide, and tacrine. Tacrine currently is not used because of major concerns about toxicity (especially for the liver). The second class is N-methyl-D-aspartate (NMDA) receptor antagonists, which for Alzheimer's disease includes only memantine; this pharmaceutical is approved only for moderate to severe Alzheimer's disease.

A recent Cochrane Review found that more patients leave AChE inhibitor treatment groups (29 percent) because of adverse events than leave placebo groups (18 percent).19 Evidence suggests that the total of adverse events occurring in patients treated with an AChE inhibitor is higher than in patients given placebo. Rates of adverse events differ widely across trials and drugs. For example, the 2006 DERP report noted earlier,17 indicated that "among placebo-controlled trials, adverse events were reported by 40% to 96% of randomized patients" (p. 9).

Although many types of adverse events can occur, gastrointestinal problems (e.g., nausea, vomiting, diarrhea, loss of body weight) are regarded as more frequent than others. Less common ADEs with these classes of drugs involve cardiovascular events (e.g., reductions in heart rate) and hepatotoxicity.

Objective of This Study and Organization of This Report

We sought to demonstrate how to use claims data to identify ADEs in elderly populations. We examined a population with diagnoses of dementia, including Alzheimer's disease, to illustrate our methods. We used a set of databases from the state of Utah that can serve as a stand-in for possible CMS databases in the future; the CMS databases are expected to combine pharmacy, inpatient, and outpatient claims for elderly and disabled Medicare beneficiaries.

To analyze ADEs of antidementia drugs, we needed to evaluate predictors of antidementia drug exposure and the persistence of antidementia drug use. We also had to assess the confounding relationship between antidementia drug exposure and our outcome measures, which included death, health care encounters for known ADEs for these classes of drugs, and health care encounters for rare and largely unanticipated ADEs of these pharmaceutical agents. These are all steps required for ADE analysis using claims data.

More specifically, our goal was develop a set of methods (a toolbox) and a data analysis framework for pharmacoepidemiologic research on ADEs using population-based claims and administrative data sources. To do this, RTI staff, working with a group of researchers at the Utah Department of Health and the Salt Lake City Veterans Administration Medical Center, developed two databases that simulate the new Medicare data: (1) a pharmacy (prescription drug) claims database and (2) a comprehensive database that links pharmacy, outpatient, inpatient, physician office, and emergency department claims and death certificate records. Using these databases, we conducted pilot studies to examine the feasibility of generating prototypical measures of drug use and ADEs for antidementia drugs.

This report documents our work for the following activities:

  1. extracting data from Medicaid databases;
  2. linking databases to produce analytic files and tables;
  3. documenting the detailed methods used for the initial steps of examining data integrity, exploring issues of exposure to these drugs and persistence in the use of these drugs, and identifying ADEs;
  4. testing these methods through a case example by investigating the incidence of ADEs in patients using antidementia drugs and the factors associated with such ADEs; and
  5. developing materials for a toolbox to guide other research teams on using pharmacy claims merged with other medical claims for a variety of ADE studies.

Chapter 2 describes in detail the methods implied by items 1 through 3 above. Chapter 3 gives our results of the incidence and other analyses of ADEs in populations using antidementia drugs. Chapter 4 discusses the implications of this work in more detail, and Chapter 5 provides some "translational" conclusions beyond the technical matters dealt with to that point. Finally, Appendices A and B provide the technical details of our methods, including a data dictionary and programming steps; these will become the core of the toolbox to be submitted separately to AHRQ at the conclusion of the project.

* The RTI DEcIDE Center is completing a project on methods for comparative effectiveness and drug safety, such as those mentioned here. Statistical and policy papers on these topics from an invitational conference (held in mid-2006) were published as a supplement to Medical Care:2007 Oct;45(Suppl 2).



Study Population

We conducted this study using data from the State of Utah Medicaid programs, which are managed by the Division of Health Care Finance in the Utah Department of Health (DOH). As of January 2006, approximately 180,000 Utah residents were enrolled in one of the Medicaid populations (not including the Primary Care Network, a waiver program). Utah has several types of Medicaid programs for children, adults, disabled and blind, aged, and pregnant adults. Children constitute by far the largest group: 62 percent of enrollees. Adults and the disabled and blind account for 14 percent each. Finally, pregnant adults and the aged account for 5 percent each. Enrollment in Medicaid is on a monthly basis.

An important fraction of Medicaid enrollees can be considered "long-term" clients. More than 60 percent of those currently enrolled have been on Medicaid for more than 2 years. These percentages vary by population; figures relevant to this study include the following:

  • 82 percent of the disabled/blind have been on Medicaid for more than 2 years, with three-quarters of them being on for more than 5 years;
  • 75 percent of the aged have been on Medicaid for more than 2 years, with two-thirds of them being on for more than 5 years;
  • 62 percent of adults have been on Medicaid for more than 2 years.

The Utah Medicaid population is very diverse. Approximately 89 percent of enrollees are reported to be white (18 percent of enrollees are white and Hispanic). The next largest group by race and ethnicity comprises American Indians and Alaskan Natives (about 4 percent of the Medicaid population). Blacks and a group comprising Asians and Pacific Islanders make up about 3 percent of Medicaid enrollees each.

About 7 percent of Utah residents are enrolled in Medicaid, but this proportion varies significantly across the state's local health districts. Rural health districts tend to have higher proportions of enrollment. Health districts near the Wasatch front (an urbanized strip in Utah located to the west of the Wasatch Range that includes Salt Lake City) tend to be below the state average.20

The Medicaid population 50 years of age and older-the population of interest for this study-had a total of 919,998 months of potential enrollment. In almost 87 percent of these months, patients were actually enrolled. For the dual Medicare-Medicaid population, Medicaid is the primary payer for prescription drugs, and because Medicaid covers the Medicare copayments for this population, information on other health care utilization is available. In approximately 21 percent of the actual 798,861 enrolled months, patients were enrolled in some form of managed care or health maintenance organization (HMO). To be eligible for this study, Medicaid patients had to be ages 50 and older, be identified in the Medicaid enrollment table, and have at least one medical claim recorded between January 1, 2003, and December 31, 2005.

Study Design

This study used an open cohort design, which means that exposure times and observation periods for each individual varied during the observation window. Cohort studies are essential to pharmacoepidemiology because they form the basis for the quantification of drug risk assessment.21 Because a cohort design follows users of the drug of interest, the design enables analysts to estimate the rate of occurrence of the target adverse event. Because of the variability in observation time, we needed to register each person's time under a risk of developing an adverse drug event (ADE) and pool this information as person-time. Because of the usual infrequency of serious adverse events, the cohort must comprise large number of subjects. For example, to quantify the risk of an adverse event occurring at the rate of 2 per 10,000 per year with a precision of ±1 per 10,000 with 95 percent probability, we would need a cohort of close to 80,000 subjects followed for 1 year.21

Until the advent of large electronic medical and pharmacy databases, cohort studies of this nature were almost impossible to conduct. Medicare Part D will produce the data needed to power the identification of serious but rare adverse events using cohort designs. Our study provides a framework for designing databases and analysis for identifying ADEs.

Data Sources

We used Medicaid pharmacy and medical claims and death records from Utah for a 3-year period (January 1, 2003, through December 31, 2005). These data included both professional and facility codes for diagnoses and procedures. We determined eligibility and enrollment on a per-month basis. Medicaid recipients could also be enrolled in a managed care organization that did not submit medical claims, which means that we lacked medical claims for these individuals. We tracked people's movement in and out of managed care status; in our analyses, we censored patients if they left the main Medicaid system and did not return or their claims were not available. We used SAS Version 9.1.3 for all data manipulation.

Steps in Creating Analytic Data Files

We developed a three-step process for designing the structure of our database and creating data analytic tables. This three-step process included (1) variable extraction from raw claims tables, (2) processing intermediate tables, and (3) merging intermediate tables to produce analysis tables. These steps are described below.

Extracting Raw Data

According to the study design, we extracted four types of raw data from the Utah files, referred to as the Medicaid Data Warehouse: (1) Medicaid eligibility or enrollment files; (2) pharmacy claims files; (3) medical claims; and (4) death certificate records. These raw data files constituted the base for developing the intermediate tables and, eventually, the linked analysis table. We processed these raw files to produce intermediate tables. This means that we selected, labeled, and structured variables to support linkage across the different tables.

We created preliminary links across tables to identify the study population. We first had to identify our cohort (Medicaid patients ages 50 and older) from the demographics table because it provided patient age. Once we had identified our study population, we linked the patient identifiers to eligibility, pharmacy, medical, and death records data to extract our cohort.

Linking across tables was not a straightforward task. We had to apply three types of linkage steps to extract our full cohort. The three types of data linkages included (1) linking client records among the Medicaid files, (2) linking death records to Medicaid deceased clients, and (3) deidentifying and then creating pseudo-identifiers for each client.

Linking client records among the Medicaid files . Except for eligibility records and death certificates, all other records contain claim-based information. However, information for one claim is saved in several separate tables by the nature of the information. We linked the records by Medicaid-assigned client unique identifications and unique claim identification.

Linking death records to Medicaid deceased clients. We used the deterministic linking method and available patient identifiers in both systems to link death records to Medicaid deceased clients.

Creating deidentified research data files. For the researchers who are not employees of Utah DOH, we excluded all patient-identifiable information from the research data files. Appendix B provides detailed information about linking records and developing the pharmacoepidemiology database and this study.

Creating Intermediate Tables

The intermediate tables are cleaned and processed but unlinked. They include an enrollment table, two primary drug exposure tables, two secondary (duplicate) exposure tables, and facility- and professional-based diagnosis and procedure tables. The enrollment table indicated enrollment status; whether a beneficiary was enrolled in an HMO product that did not submit medical claims; and basic demographic information such as age, sex, and race.

We created two primary drug exposure tables, one for each class of antidementia drugs (i.e., acetylcholinesterase [AChE] inhibitors and n-methyl-D-aspartate [NMDA] receptor antagonists), and two secondary drug exposure tables. The drug exposure tables provide day-by-day exposure status for each drug class, number of days supplied, number of units dispensed, dose, and generic name. The secondary tables were intended to store information on prescriptions that were filled on the same day for the same drug class. Simultaneous fills for the same drug class accounted for less than 1 percent of all prescription fills. The secondary table potentially can be used to evaluate the relationship between titration strategy and experience of ADE; however, we did no further analysis with these data.

We separated medical claims into four tables, which included professional-based claims and facility-based claims. The four tables are (1) a professional-based procedure table, using Current Procedural Terminology (CPT) codes; (2) a professional-based diagnosis table, using International Classification of Diseases, version 9, clinical modification (ICD-9-CM) codes;

(3) a facility-based procedures (ICD-9-CM procedure codes) table; and (4) a facility-based diagnosis (ICD-9-CM) table. Indicator variables identify provider type and location.

We included the Healthcare Cost and Utilization Project (HCUP) Clinical Classification Software (CCS) codes. The CCS is a diagnosis and procedure categorization scheme based on the ICD-9-CM in the facility- and professional-based diagnoses tables. We also used the CCS-CPT, which is a means by which to classify CPT codes into clinically meaningful procedure categories. The procedure categories are identical to the CCS, except that the latter has specific categories unique to the professional service codes in CPT. (Detailed descriptions of the HCUP tools can be found at .)

The intermediate tables were designed to support linkage across tables to produce an analysis table. The analysis table preserves the longitudinal history of each subject and person-time information (see Appendix B).

Creating the Analysis Table

We produced the analysis table by linking data from the intermediate tables, which involved multiple steps. First, we needed to link the two primary drug exposure tables and the death table with the enrollment table. These tables (enrollment, drug exposure, and death), when linked together, form the day-by-day drug course table, which is the foundation for quantifying drug exposure history. We used Medicaid eligibility and prescription fills to define drug exposure start dates and end dates, enrollment status, and death. Second, once we had fully defined drug exposure, we transformed the drug course table to discrete time intervals of 1 week. We then appended the drug course table to the entire Medicaid cohort enrollment table. The final step was to include comorbidities and target outcomes from the diagnosis tables to produce the final analysis table.

Measures Used in This Study

Data Integrity Analysis

To examine the integrity of the Utah Medicaid database for use in pharmacoepidemiology research, we performed descriptive analyses to examine three categories of potential data error: (1) incomplete claims for certain time periods, (2) linkage analysis, and (3) diagnosis codes in groups in which the condition should not occur (sex-specific diagnosis). We first explored the possibility that blocks of claims were missing by month during the study span. We also evaluated the general trends in claims by month for indications of anomalies. To examine the overall validity of diagnosis and demographic data, we identified a set of disorders that would be expected to occur only in either females or males. We used the Agency for Healthcare Research and Quality (AHRQ) HCUP CCS classifications to identify (1) female-specific disorders (HCUP CCS codes 10.3 and 11) and (2) male-specific disorders (HCUP CCS codes 10.2). Then we examined the number of diagnoses in the expected demographic group versus the number in the unexpected group. If the data are sound, we would expect at worst only a small fraction of mismatches.

Therapeutic Course Reconstruction for Exposure and Persistence

To reconstruct patients' drug regimens, we identified index dates, additions, drops, gaps, and switches. Because we were interested in time-updated status, only four regimens were possible: (1) no therapy; (2) Class 1, AChE inhibitors; (3) Class 2, NMDA receptor antagonists; and (4) dual use of Class 1 and Class 2 drugs.

Once we determined the medication regimens, we could establish the therapeutic course. Patients were considered "nonincident" users (i.e., established users) for a particular therapeutic course if they received an antidementia drug prescription within 60 days from the start of the study (January 1, 2003) or 60 days from their first enrollment date after a period of nonenrollment. "Incident" users (i.e., new users) were all others in the database. Because of the limited size of the population, we did not attempt to establish an inception cohort.22 An inception cohort would require a more stringent drug-free period, such as 9 months or 1 year. An inception cohort design is intended to identify naïve users (i.e., to establish the first-ever exposure to the medication of interest).

Exposure. Figure 1 illustrates the logic for determining patients' drug courses. The top panel shows dispensing of drug classes 1 and 2 along the axis of time. The regimens deduced from these records are depicted in the center bar. The bottom panel shows the changes in drug therapy and their time course. Specifically, the hypothetical patient might start with a Class 1 drug and later add a Class 2 drug. He or she might then discontinue the Class 1 drug but continue the Class 2 drug for a while before having a significant gap in therapy (90 days). The patient then begins a second course of therapy after the gap, returning to a Class 1 drug (possibly a different medication in this same class), but apparently stops taking this drug before the end date of the period of observation. The final period of noncompliance lasts until the end of the observation window; thus, this patient would be nonpersistent with the second course of therapy.

Figure 1. Drug regimen reconstruction for a hypothetical patient

Illustration of the logic for determining patients' drug courses. An arrow is shown with two boxes that depict the timeframe for a Class A drug being started, replaced, and then resumed. Below it is an arrow with a box depicting the timeframe for a Class B drug replacing a Class A drug and then being discontinued. A chart below that shows the course of treatment. Headings appear for Time, Start Date, and End Date. On the left are the categories Recorded regimens, Treatment status, Index, Addition, Drop, Gap, New course, and Nonpersistence. Categories are shaded between the start and end dates to show when they occurred. Recorded regimens are A, then A and B, then B, then a gap, then A(2). Index is shaded for A. Addition is shaded for A and B. Drop is shaded for B. New course is shaded for A(2). Nonpersistence is shown at the gap between B and A(2) and at the end date after A(2).

Persistence . To determine whether patients were persistent with a course of therapy, we examined two types of gaps: (1) the last recorded supply in relation to the end of the study and (2) gaps between drug supplies. We considered patients to be "nonpersistent" if they had a gap in their drug supply of 60 days or more and were under observation, which means they were currently enrolled in Medicaid and alive. (The last row of Figure 1 illustrates this point.) Those who lost eligibility or died were censored. For those whose antidementia drug regimen consisted of more than one class (i.e., both AChE inhibitors and NMDA), we considered the nonpersistent possibility only if they ran out of medications from both classes.

We set the end date for a course of therapy at 60 days after the end of the supply for the last prescription or when a 60-day gap occurred between therapies. If a 60-day gap occurred in therapy (i.e., a patient was initiated on a later course), we reassigned such patients to a course identifier and reentered them as starting as a new user for a second course.

Course of therapy. Patients may be noncompliant, meaning that they do not take the medication as specifically directed by their clinician, but persistent, meaning that they continue to consume the medication. To deal with this complexity, analysts have reported different approaches in the literature to quantify drug exposure and define the maximum allowable treatment gaps that a patient may have between two prescriptions to be defined as a continuous user. Some approaches include using a defined number of days or a fraction of the theoretical duration of the prescription as the buffer. The theoretical duration of a prescription can be calculated by the number of days the medication was supplied. Thus, the end date of a prescription equals the start date plus the theoretical duration of a prescription.

The method by which investigators calculate course of medication influences the identification of incident courses or users and persistence on therapy,23 and both of these measures can influence ADE estimates. For example, if one is interested in an ADE within the first 30 days of therapy or 1 week after discontinuation, then correct identification of the start and end of drug supply dates is crucial for unbiased effect estimates.

For this reason, we designed our course generator SAS macro program to accommodate easily the different criteria for identifying incident users and the end of a therapeutic course. For example, it is programmed to produce tables for 30-, 60- and 90-day drug-free periods. For this report, we have evaluated only persistence and identified drug course with a 60-day drug-free period.

Disease ascertainment and risk adjustment. We used the HCUP CCS for ICD-9-CM and CPT codes to identify patients with dementia or dementia-like disorders and for case ascertainment (Table 1 ). We used the HCUP comorbidity software for severity adjustment. We calculated a running total of the number of comorbidities and used this figure as a time-updated risk adjuster. We also calculated the numbers of clinic and emergency department visits and hospital admissions in a time-updated manner and used those for risk adjustment. We joined the target outcomes, comorbidities, and risk adjusters with the course table to produce the final analysis table.

Adverse Drug Events

Outcomes associated with adverse drug events. This project typifies an approach intended to complement regulatory pharmacovigilance. To illustrate the method, we sought to measure associations with three different classes of ADEs at different times in the drug course. The classes of events are death, expected reactions, and novel or idiosyncratic reactions.

Death is of particular interest for this study because, in February 2005, the U.S. Food and Drug Administration published an Alert for Healthcare Professionals concerning galantamine. The alert stated that the preliminary results of two clinical trials carried out with galantamine in patients with mild cognitive impairment indicated a risk of death three times higher in patients treated with galantamine than those who were given placebo (

Table 1. Dementia codes and targeted outcomes codes from the Healthcare Cost and Utilization Project
HCUP CCS Codes Description
Dementia diagnoses
5.3 Senility and organic mental disorders [68]
Gastrointestinal outcomes
9.4.3 Gastritis and duodenitis [140]
9.4.4 Other disorders of stomach and duodenum [141]
9.11 Noninfectious gastroenteritis [154]
9.12.3 Other and unspecified gastrointestinal disorders
17.1.6 Nausea and vomiting [250]
17.1.7 Abdominal pain [251]
Hematological outcomes Iron deficiency anemia Other deficiency anemia Aplastic anemia Acquired hemolytic anemia Other specified anemia Anemia; unspecified
4.2 Coagulation and hemorrhagic disorders [62]
4.3 Diseases of white blood cells [63]
4.4 Other hematological conditions [64]
Hepatic outcomes
9.8.2 Other liver disease [151] Cirrhosis of liver without mention of alcohol Other unspecified liver disorders
Psychological outcomes
5.4 Affective disorders [69]
5.6 Other psychoses [71]
5.7 Anxiety, somatoform, dissociative, and personality disorders
5.9 Other mental conditions [74]

HCUP, Healthcare Cost and Utilization Project; CCS, Clinical Classification Software

One concern with using observational data to evaluate death as an outcome is that the relationship between drug exposure and death may be highly confounded; this is true for antidementia drugs. These drugs are preferentially used when patients are expected to experience improvements in quality of life, but they are avoided in late stages of dementia.

For these analyses, clinical trials usually establish ADE rates for adverse reactions resulting in discontinuation-a very broad category-or for those meeting regulatory seriousness criteria-a very narrow category of harm. For expected ADEs, our analysis is intended to measure associations of increased health care utilization, in this case attributable to gastrointestinal and psychiatric adverse effects. We classify these ADEs, especially gastrointestinal problems as "expected." For some drug classes and reactions, this analysis also may confirm that the analytic procedures are performing as intended.

Idiosyncratic reactions may be too rare to be detected in clinical trials. Hematological and hepatic syndromes are two examples of potentially fatal reactions discovered in nonregulatory postmarketing surveillance of AChE inhibitors.24 These reactions were chosen because they can cause severe effects to major organ systems. These two classes of idiosyncratic reactions also have the advantage of rarely being associated with unmeasured confounders.

Studies have shown that dropout from clinical trials and ADE rates are positively associated with dose of antidementia drugs.25 We were unable to evaluate the relationship between drug dose and adverse outcomes in this study because the raw Medicaid data did not include the quantity of tablets dispensed for years 2003 and 2004. The database design, however, does support analysis of dose effects, such as daily dose and cumulative dose.

Our selection of ADEs is by no means intended to represent all possible adverse reactions that can occur with antidementia drug use. Other theoretical adverse reactions that may be associated with cholinesterase inhibitors include muscular weakness and respiratory failure.26 As noted above, we elected to focus on death, two types of anticipated side effects, and two types of rare, unusual side effects.

Exposure periods. The exposure periods of interest are new use, established use, and, depending on the drug, the period immediately after discontinuation. Most adverse reactions occur during the early stages of treatment;22 restricting the analysis to incident users or drug therapy courses may increase the power to detect a measurable association between drug exposure and event. In many drug classes, including antidementia drugs, early onset is particularly true of expected reactions such as nausea and agitation. Other reactions, including some idiosyncratic reactions, may occur at any point in the treatment, so this stage also must be examined.

Adverse events related to withdrawing a drug that clinicians are using to treat patients with a chronic condition represent an important burden of harm.27 Moreover, when these events cannot be attributed to pharmacologic withdrawal, they are outside of regulatory pharmacovigilance activities. The geriatricians on our team have seen but not quantified cases of severe agitation and, more rarely, psychosis after withdrawal of antidementia drugs. Because of its additional complexity, the analysis of the withdrawal period is outside the scope of this contract.

Investigation of withdrawal effects would require designs that could capture transient effects. Cohort-crossover,28 case-crossover,29 case-time-control,28 and duration-specific30 designs can be used to evaluate transient effects that would be expected with drug discontinuation. Each design has its strengths and weaknesses. For sparse events, the case-crossover, case-time-control, and duration-specific designs may be more efficient than the others. Case-crossover types of studies use the exposure history of each case as its own control to examine the effects of transient exposures on acute events.

Procedures for analyses. The primary outcome variable for examining ADEs was death. Secondary measures included health care visits for specific conditions. The primary predictor was exposure to antidementia drugs. Time-updated measures of disease severity included a dementia or dementia-like diagnosis, the 29 HCUP comorbidity conditions, a running total of the number of HCUP comorbidities, a running total of office visits, and age. We also included sex and ethnicity in the model.

We obtained the date of death from the state death certificate. We identified the secondary outcomes from medical facility claims. The secondary outcomes were health care visits for gastrointestinal or psychiatric (psychological) disturbances and for hepatic and hematological complications. We did not examine hospitalizations and emergency department visits because of the low number of cases for these outcomes. The index date for the cohort was the first antidementia drug prescription. The exposure period lasted the duration of the drug course. We calculated baseline comorbidity and health care use variables on the basis of each patient's first medical claim.

Statistical Analysis

Data Integrity Analysis

Frequency of claims by month. The data integrity analysis was descriptive. We used scatter plots to evaluate the volume of claims over time to determine if blocks of claims appeared to be missing. The percentage of Medicaid recipients from each data source that linked to recipients in the enrollment table also was examined to determine if gross anomalies were evident. Finally, we examined the frequency of sex-specific diagnoses.

Linkage analysis. Each table was linked to the enrollment table by a subject identification variable. We determined the frequency of subjects who linked to the enrollment table to determine the percentage linked.

Sex-specific analysis. We compared sex-specific diagnoses against documented data on sex and evaluated the count and frequencies for anomalies.

Exposure and Persistence

Because patients began treatment at different times during the study period and thus were followed for differing lengths of time, we used Kaplan-Meier failure analysis to estimate the cumulative persistence rates. Patients were censored at the end of the observation period or if they lost eligibility or died. We assessed differences in persistence rates with log-rank tests. We also used an extension of the Cox proportional hazards model that allows for time-dependent exposures and covariates to assess differences in medication persistence by the current medication regimen (AChE, NMDA, or AChE and NMDA). This multivariable model adjusts for potential confounders, which included demographic variables, dementia diagnosis, comorbidities, and other indicators of health status.

We used four time-updated measures of health status: the number of hospital admissions, emergency department visits, clinic visits, and HCUP comorbidities. A patient's survival time (i.e., persistence) was determined as the time to discontinuation of a therapeutic course. The primary exposure variable was time on therapy. Other time-dependent variables included dementia diagnosis, number of diagnoses, number of contacts with health care professionals (professional claims), and age. Finally, we also included sex and race.

Adverse Drug Events

Our analytic approach to ADEs had three components: (1) addressing confounding by indication;13 (2) evaluating adverse events and outcomes that would be predicted from previous studies, including randomized trials; and (3) identifying previously unrecognized ADEs. The third component serves a hypothesis-generating function and is intended to be preparatory for more in-depth analyses. The use of the cohort study design to examine effectiveness and safety is complementary to pharmacovigilance activities related to voluntary reporting.

We defined three categories of outcomes: death, expected adverse reactions, and idiosyncratic unexpected reactions (Table 2 ). The analyses fall into two main groups: (1) cumulative effect estimates measured by using a basic cohort design that includes exposed and unexposed periods, and (2) estimates of transient or acute effects where we used a cohort crossover design to evaluate outcome rates in early versus later stages of exposure. To estimate cumulative effects of chronic exposure, we performed two analyses: marginal structural models using inverse probability weights (IPW) and Cox proportional hazards models. Comparing estimates may reveal possible confounding by an intermediate variable, which would then call for attention in subsequent analyses. Comparison between early and late stages of therapy using Poisson regression is an approach to signal possible ADEs, because adverse reactions generally occur more frequently in the early stages of treatment.

Table 2. Overview of reasons for choice of outcomes and analytic methods
Outcome Reason for Outcome Selection Expected Level of Confounding Model Drug Treatment as Time-Varying Exposure Comparison of Event Rates During Early and Late Treatment Intervals
  • Death
  • High clinical importance
  • High
  • Hazard of death before treatment or among never-treated individuals is compared with hazard of death after treatment has begun
  • Cox regression compared with IPW for confounding control
  • Not performed
  • Expected ADEs (gastrointestinal and psychiatric)
  • Better characterize excess health care utilization related to these ADEs
  • For some drugs and well-described outcomes, confirm that modeling approach is correct
  • Moderate to high
  • Similar to above, hazard ratio is estimate of effect
  • Incidence rate ratio is estimate of effect
  • Remove confounding from subject-specific factors
  • Identify temporal patterns of principal diagnoses suggestive of ADE
  • Identify transient effects
  • Unexpected, idiosyncratic ADEs (hepatic and hematologic)
  • Screen for previously unknown ADEs
  • Low to moderate
  • Hazard ratio is estimate of effect
  • Identify temporal patterns of principal diagnoses suggestive of ADE
  • Identify transient effects

ADE, adverse drug event; IPW, inverse probability weights.

Patients were eligible for the ADE analysis if they had a dementia or dementia-related diagnosis (Table 1 ). We created univariate and multivariate models to evaluate the relationship between antidementia drug exposure and death or targeted clinical events. We evaluated the relationship between antidementia drug exposure and death by two multivariable methods: (1) extension of Cox proportional hazards methods that allow time-dependent covariates and (2) marginal structural models (MSM) using IPW.31 IPW methods are designed to allow proper adjustment for time-dependent confounding.

The outcome variables included death and the clinical outcomes listed in Table 2. Patient health or care use characteristics that are potential time-varying confounders include number of comorbidities, number of health care contacts (office visits, emergency department visits, and hospitalizations), and HCUP comorbidity indicators (e.g., diagnosis of depression, psychological and neurological conditions, diabetes, hypertension, congestive heart failure). IPW is the inverse of the probability of the person being prescribed the therapy and thus weights their likelihood. To illustrate

Let Ai (t) be the exposure for subject i at time t, and Ai (t) = 1 if the patients are exposed to antidementia medication, and Ai (t) = 0 otherwise. Let Vi be the baseline values for the time-dependent covariates, and let Li (t) be the values of these covariates at time t for subject i, so that Vi = Li (0).

The marginal structural Cox proportional hazards model can be expressed as

Lambda subscript T subscript overline A parenthesis t backslash V close parenthesis = lambda subscript 0 parenthesis t close parenthesis e x p Beta subscript 1 A parenthesis t close parenthesis plus Beta subscript 2 V

where Lambda subscript T subscript overline A parenthesis t backslash V close parenthesis is the hazard of death at time t among subjects with baseline covariates V had they all been exposed to antidementia medications Overline A\. Vectors β1 and β2 are unknown parameters to be estimated, and λ0 is an unspecified baseline hazard function.

The IPW is listed in Formula [1], where the denominator is the conditional probability of having the observed exposure up to time t, given past exposure and covariates. Because these weights tend to be highly variable and fail to approximate normality, we used the stabilized version of this method because of its smaller variance, tighter 95 percent confidence interval, and better coverage rates. It is given by Formula 2 below.

W parenthesis t close parenthesis = Product symbol with t above, k = 0 below. Fraction with numerator 1, denominator f A parenthesis k close parenthesis | overline A parenthesis k minus 1 close parenthesis, overline L parenthesis k close parenthesis

Because some subjects will not have the experienced the event before the end of the followup, they are said to be censored. Let C(t) be the binary indicator for censoring, taking on the value of 1 if the patient is censored at time t, and zero otherwise. The inverse-probability-of-censoring weight is calculated in a similar fashion to the IPW:

SW subscript censoring parenthesis t close parenthesis = Product symbol t above, k = 0 below. Fraction with numerator Pr brace C parenthesis k close parenthesis = 0 | C parenthesis k minus 1 close parenthesis = 0, overline A parenthesis k minus 1 close parenthesis, V, T > k, denominator Pr brace C parenthesis k close parenthesis = 0 | C parenthesis k minus 1 close parenthesis = 0, overline A parenthesis k minus 1 close parenthesis, overline L parenthesis

The overall weight to put in the MSM would be the multiplication of the two stabilized weights, such that

SW subscript overall parenthesis t close parenthesis = SW parenthesis t close parenthesis times SW subscript censoring parenthesis t close parenthesis

We used Cox proportional hazard methods allowing recurrent events to evaluate the causal relationship between antidementia drug exposure and clinical outcomes. Drug exposure was treated as a time-dependent covariate; patients could enter as unexposed, but once they received treatment, they were categorized as exposed for the remainder of their follow-up time.

We also evaluated the drug effects on clinical outcomes during the first month of treatment. In this analysis, we used a within-subjects design (i.e., we used random effects Poisson models) to compare clinical outcomes after the first 30 days of exposure with clinical outcomes during the remainder of followup for exposed individuals.

Table 3 provides a general summary of the main steps involved in producing and analyzing pharmacoepidemiologic data. This summary and document is intended to be a guide for designing pharmacoepidemiologic studies. The analysis section includes key recommendations for the Standards of Reporting Observational Studies in Epidemiology (STROBE) report.32

Table 3. Steps in the design and analysis, evaluation, and reporting
Step Description Actions
Design and Analysis
1 Design database
  1. Conceptual planning
  2. Identify data sources
2 Extract raw data
  1. Identify cohort (establish inclusion criteria)
  2. Retrieve from data warehouse
  3. Eligibility, pharmacy, medical, and death
  4. Link patient identifiers among extract records from eligibility, pharmacy, medical, and death data
  5. Deidentify patient data by creating a pseudo-identification number
3 Create intermediate tables
  1. Create primary and secondary drug tables
  2. Link death record to eligibility
  3. Produce professional-based and facility-based procedural and diagnostic tables
  4. Run disease classification and comorbidity software
  5. Run pharmacy generic-ingredient classification system (e.g., First DataBank, MULTUM)
4 Evaluate integrity of data
  1. Evaluate volume of claims by month
  2. Evaluate linkage among tables
  3. Conduct sex- and age-specific analyses
  4. Evaluate completeness of data
  5. Check to see if values are within expected ranges
5 Produce analysis table
  1. Join Primary Drug Exposure tables (one record per day from start of first antidementia drug to the end of antidementia drug use)
  2. Join Eligibility to linked Primary Drug Exposure table
  3. Reconstruct therapeutic course
  4. Identify incident and nonincident users/courses
  5. Join Therapeutic Course table to disease ascertainment and risk adjustment variables (covariates); this should include HCUP comorbidity and pharmacy data
6 Objectives
  1. State specific objectives, including any prespecified hypotheses
7 Design
  1. Describe study design, setting, inclusion criteria, sources, and methods of patient selection
  2. Give period of followup
8 Variables of interest
  1. List and define all variables of interest
  2. Give details for methods of assessment
9 Analytic methods
  1. Describe measures taken to address potential sources of bias
  2. Describe rationale for study size, including practical and statistical considerations
  3. Describe all statistical methods
  4. Describe how loss to followup and missing data were addressed
  5. If applicable, describe methods for subgroup and sensitivity analysis
Reporting of Results*
10 Participants
  1. Report number of individuals at each stage of the study (e.g., potentially eligible, actually eligible)
11 Descriptive data
  1. Give characteristics of participants
  2. Indicate completeness of data for each variable
  3. Summarize average and total amount of followup
12 Outcome data
  1. Report numbers of outcome events or summary measures over time
13 Main results
  1. Give unadjusted and confounder-adjusted effect estimates and their precision
  2. Make clear why certain confounders were included and why others were not
  3. Translate relative measures into absolute differences for a meaningful risk period

* Key recommendations from the Standards for Reporting of Observational Studies in Epidemiology (STROBE) report.32

Human Subjects Review and Data Protection

For this project, RTI International (RTI) and the Utah DOH had access only to linked data provided by the Salt Lake City Veterans Administration Medical Center (VAMC) IDEAS (Informatics, Decision- Enhancement, and Surveillance) program; these data lacked patient identifiers. Because this project used only secondary data with no means of tracking the claims back to individuals, it posed no physical, psychological, social, or legal risks to participants. The data used in this study are confidential medical and pharmacy claims records stored and managed by the VAMC; they are unavailable to the public. Although these claims were filed by individuals, the VAMC had removed all personal identifiers from the records and substituted an encrypted number. The Salt Lake City VA Information Resource Management System maintained and secured the data. Data were protected on a secure, password protected network or password- protected PC.

RTI staff submitted a Request for Exemption from Institutional Review Board (IRB) to RTI's Committee for the Protection of Human Subjects. The IRB approved this request on January 5, 2006. VAMC staff requested IRB approval from the Office of the Vice President for Research at the University of Utah; they received IRB approval on June 8, 2006.

Peer Review of This Report

In accordance with procedures of the Agency for Healthcare Research and Quality (AHRQ), a draft of this report received external peer review from six experts, one of whom provided a statistical review. This review process, conducted anonymously, was organized by AHRQ's Scientific Resource Center at the Oregon Health and Science University. RTI and Utah staff compiled all comments and used them in revising the draft to create this final report.


We present a framework for using medical claim data to identify novel types of adverse drug events (ADEs), with antidementia treatment selected as the exemplar. These data specifically included the acetylcholinesterase inhibitors (AChEs)-donzepezil hydrochloride, rivastigmine tartrate, and galantamine hydrobromide (we excluded tacrine from this analysis), and one N-methyl-D-aspartate (NMDA) receptor antagonists, memantine. Chapter 2 documented the technical steps for data preparation. This chapter presents the results of the three steps of data analysis-verification of data integrity, characterization of persistence of drug use, and analysis of ADEs.

Data Integrity Analysis

Frequency of Enrollment, Claims, and Deaths, by Month

Visual inspection of graphs in Appendix B (Figures B.1 through B.7) did not reveal missing blocks of claims or enrollments. We did not observe outlier data points in the per-month analysis. Because of magnification, the peaks and valleys in those graphs appear more pronounced than they actually are; if the anchor were set to zero, the peaks and valleys would disappear.

Our graphical analysis did reveal an excess of pharmacy claims for 2003 and 2004. In searching for an explanation for this anomaly, we determined that our data had not originally been limited to patients ages 50 and older in these periods during the data extraction process. Thus, we excluded these excess pharmacy claims from 2003 and 2004 after linking pharmacy data to the enrollment file.

Linkage Analysis

A high proportion of medical claims and deaths successfully linked to enrollment tables (Table 4 ). The percentages of subjects from the facility, professional, and death claims who did not link to the enrollment table ranged from 0.25 percent for professional procedure claims to 6.8 percent for death records. Unsuccessful links can occur from having either "too many" individuals in the medical claims or death records or from incomplete enrollment history. Because we were unable to identify a systematic cause of the incongruence, we do not consider the unlinked patients to be a threat to the validity of this analysis.

Table 4. Frequency of linked records, by subject identification number
Periods of Linkage Linked Frequency Percent
Enrollment to Diagnosis (professional) Yes 39,762 97.7
No 935 2.3
Enrollment to Procedure (professional) Yes 39,762 99.75
No 100 0.25
Enrollment to Diagnosis (facility) Yes 39,762 98.76
No 498 1.24
Enrollment to Procedure (facility) Yes 39,762 97.77
No 906 2.23
Enrollment to Pharmacy Yes 39,762 14.75
No 229,726 85.25
Enrollment to Death Yes 39,762 93.19
No 2,907 6.81

A high percentage (85 percent) of pharmacy claims did not link to enrollment tables. This failure was an artifact of the fact that pharmacy claims from 2003 and 2004 were not limited to patients ages 50 and older, as described above. When building the analysis table, the linkage procedure automatically excluded the undesired pharmacy claims.

Sex-Specific Analysis

Of the 880 female-specific diagnoses, only 11 (0.1 percent) occurred in males (Table 5). No male-specific disorders were identified in females. We did not identify any women ages 60 or older who had a diagnosis for complications of childbirth and pregnancy.

The frequency of discrepant sex-and age-specific diseases, albeit low, can be attributed more to miscoding of either sex or diagnosis than to a systematic problem with the database, such as a merge failure. We expect a merge failure or other type of database management problem to produce a much higher frequency of discrepant sex-and age-specific diseases.

In summary, our exploration of data integrity did not identify uncorrectable problems. No blocks of claims appeared to be missing. A small fraction of medical claims and death records did not link to the Medicaid enrollment table and only 0.1 percent of diagnosis claims indicated a discrepancy of sex-specific diagnoses.

Table 5. Frequency of males having female-specific diagnoses
Sex Category of Diagnosis Total Number of Diagnoses
Not Female-Specific Female-Specific
Female 93,784 869 94,653
69.36% 0.64% 70%
Male 40,551 11 40,562
29.99% 0.01% 30%
Total 134,335 880 135,215
99.35% 0.65% 100%

Exposure and Persistence

Of the 29,057 Medicaid patients who fit the inclusion criteria described in Chapter 2, 1,634 had been dispensed at least one prescription for an antidementia medication during the 3-year study period (2003-2005). These 1,634 patients had a total of 1,844 courses of drug therapy (Figure 2 ). Thus, approximately 12 percent of patients had multiple courses of therapy, demarcated by drug-free periods of at least 60 days. Overall, 497 of the 1,844 drug therapy courses met criteria for discontinuation.

Kaplan-Meier survival curves illustrate median length of drug therapy courses. Figure 3 shows a plot, by week, of the Kaplan-Meier estimates of the probability of remaining on drug therapy for all antidementia drug users. A steep drop in persistence occurred at Week 5, attributable primarily to new (incident) courses in which the patient received an initial drug supply of 30 days and then stopped therapy.33 The rate of discontinuation decreased with time on therapy.

We identified 517 incident (new) users with 566 courses of therapy, and 1,177 nonincident (established) users with 1,277 courses of therapy.

Incident use had a hazard ratio for drug discontinuation of 1.4 (95% confidence interval [CI], 1.16-1.68, see Figure 4 ). Thus, at any given point, incident users were estimated to be 1.4 times more likely to discontinue therapy than nonincident users. The persistence curves were consistent with the proportional hazards assumption, which is that the ratio of hazard rates for incident and nonincident users was constant. The log rank test was statistically significant (χ­­`2 = 10.02; P = 0.0015) for difference in hazard rates for discontinuation between incident and established users.

Figure 2. Drug exposure and dementia diagnosis in the study population

Flowchart showing boxes from top to bottom. First is Number enrolled age 50+, 39,761. Second is Enrolled and at least one medical claim, 31,339. Third is Enrolled, at least one medical claim, and not starting in HMO (health maintenance organization), 29,057. Fourth are two choices: Dementia diagnosis, 4,859 (17%) and No dementia diagnosis, 24,198 (83%). Under Dementia diagnosis are two categories: No AD (antidementia) drug, 3,629, and AD drug star, 1,234 (1,377 courses). The star refers to a footnote stating that incident users and nonincident users are not mutually exclusive. Under AD drug are two choices: Nonincident user, 881 (962 courses), and Incident user, 381 (414 courses). Under No dementia diagnosis are the same categories, with the following data: AD drug star, 400 (467 courses). The star refers to a footnote stating that incident users and nonincident users are not mutually exclusive. No AD drug, 23,798. For AD drug, Incident user, 136 (152 courses), and Nonincident user, 296 (315 courses).

Figure 3. Persistence curve for antidementia drug course

Graph showing Week on the x axis and Percent Persistent on the y axis. Week is numbered from 0 to 150 in increments of 50. Percent Persistent is numbered from 0 to 1 in increments of .25. At week 0, percent persistent is 1. It falls as the weeks progress, ending at .5 at week 150.

Adverse Drug Events Baseline Characteristics

The 381 patients with an incident course of therapy and a dementia diagnosis were included in the ADE analysis (Figure 2 , bottom row). Restricting the cohort to patients with a dementia diagnosis limited the risk-set to a defined, more homogeneous population and provided a more straightforward approach to address confounding.

The baseline period was defined as the time interval from entering Medicaid to either the first antidementia drug dispense date or the date a dementia diagnosis was first recorded in the medical claims data. We compared baseline characteristics between the exposed and never-exposed group at the time of the first dementia diagnosis or antidementia drug claim. The exposed and never-exposed groups differed significantly in age and average number of comorbidities (Table 6 ). Although the differences in mean age (76 versus 78) and comorbidities (3.6 versus 3.1) were statistically significant, they were small and probably not clinically meaningful.

Figure 4. Persistence curve for antidementia drug course by incident and established users

Graph showing Week on the x axis and Percent Persistent on the y axis. Week is numbered from 0 to 150 in increments of 50. Percent Persistent is numbered from 0 to 1 in increments of .25. The graph shows established users and incident users. At week 0, for established users, percent persistent is 1. It falls as the weeks progress, ending around .5 at week 150. For incident users, at week 0, percent persistent is 1. It falls as the weeks progress, ending around .5 shortly after week 100.

Table 6. Baseline characteristics of patients at time of dementia diagnosis
Patient Characteristics Exposed (n = 381) Never Exposed (n = 3,629) P-Value
Mean age, years (SD) 76 (10.8) 78 (12.4) 0.0044
Female,% 75 71 0.0990
Depression diagnosis,% 15 13 0.1710
Psychological diagnosis,% 16 14 0.4334
Neurological diagnosis,% 35 28 0.0089
Diabetes,% 16 18 0.4450
Congestive heart failure,% 14 18 0.0227
Average number of comorbidities (SD) 3.6 (2.98) 3.1 (3.00) 0.0031
Average number of hospitalizations (SD) 0.28 (0.62) 0.32 (0.80) 0.2192
Average number of emergency department visits (SD) 0.08 (0.36) 0.17 (0.77) <0.0001
Average number of clinic visits (SD) 26 (39.5) 25 (43.3) 0.6853
Baseline period-first eligibility to dementia diagnosis or drug exposure (SD) 34 weeks (37.3) 32 weeks (39.8) 0.3428

SD, standard deviation.

The baseline periods between the groups were similar (34 versus 32 weeks). The exposed group had a higher frequency of patients with a neurological diagnosis than the never-exposed group, 35 percent versus 28 percent, respectively. The never-exposed group had a higher frequency of congestive heart failure than the exposed group, 18 percent versus 14 percent, respectively. The groups did not differ by sex; frequency of depression diagnosis, psychological diagnosis, or diabetes diagnosis; or by the average number of hospitalizations, emergency department visits, or other health care contacts.

During a mean followup of 95 weeks (95% CI, 94.0-97.0), 1,707 (42 percent) of the patients died. During the follow-up period, we found 1,107 visits (28 percent of all ambulatory visits) that had a gastrointestinal-related event as the primary diagnosis, 379 (9 percent) with a psychological diagnosis, 504 (12 percent) with a hematological diagnosis, and 461 (11 percent) with a hepatic diagnosis.

Table 7 presents the number of events that occurred after exposure to an antidementia medication and the total number of events per person-time (incidence densities). The outcomes presented include death, outpatient episodes of care for the clinical outcomes, and hospital admissions and emergency department visits coded as the reason for seeking medical care (i.e., primary diagnosis). Each outcome is presented for the exposed group after first exposure and for the unexposed time, which includes those who were never exposed and the time before the first incident course of drug therapy. Clinical events that led to hospital admission or emergency department visits were sparse; for this reason, we included clinic visits in the regression models.

Table 7. Crude incidence of adverse drug events in exposed and unexposed groups
Outcome Exposed Time Unexposed Time
Number of Events Incidence Density and 95% Confidence Interval Number of Events Incidence Density 95% Confidence Interval
(100 patient-years) (100 patient-years)
Death 77 19.4 (17.20-21.62) 1,409 20.3 (19.72,-20.80)
Clinical Episodes
Gastrointestinal episodes 105 26.5 (23.88-29.05) 2,030 29.2, (28.54- 29.84)
Psychological episodes 66 16.6 (14.59-18.68) 1,466 21.1 (20.53,-21.63)
Hematological episodes 79 19.9 (17.67-22.15) 1,321 19.0 (18.47-19.52)
Hepatic episodes 12 3.0 (2.15-3.90) 269 3.9 (3.63- 4.10)
Hospital Admissions and Emergency Department Visits
Gastrointestinal diagnoses 8 2.0 (1.30-2.73) 102 1.5 (1.32- 1.61)
Psychological diagnoses 0 0 75 1.1 (0.95- 1.20)
Hematological diagnoses 0 0 34 0.5 (0.41- 0.57)
Hepatic diagnoses 0 0 11 0.2 (0.13-0.24)

Episodes are intended to differentiate clusters of events. We required a 1-month gap in claims for the clinical outcome to start a new episode of care. The crude estimates of the incidence densities are higher during unexposed time for all events. Because of the low incidence rates of clinical events related to hospital admissions and emergency department visits, we did all further ADE exploration on clinical episodes from clinic visits.

Models for Estimating Associations Between Exposure and Outcomes

We estimated associations between exposure to antidementia drugs and outcomes of interest using three procedures: (1) marginal structural models (MSM) with inverse proportional weighting (IPW) techniques, (2) Cox proportional hazard modeling, and (3) Poisson regression models. Chapter 2 explains the choice of procedures in more detail.

MSMs are fitted in a two-stage process. In the first stage, we estimated each subject's probability of having his or her own treatment history and then used that individual's predicted probability of treatment to derive IPWs. In the second stage, we used the time-varying IPWs in the regression model to remove confounding from the treatment-outcome relationship. As discussed previously, the MSM with IPW has been shown to effectively remove confounding related to time-varying confounding variables.

Table 8 shows first-stage results from a logistic regression that estimates the probability of receiving antidementia drug as a function of the time-varying covariates and their baseline values. To reduce the number of free parameters, we fitted the regression with a natural cubic spline of the week variable. By doing so, we could take the time to event into account; thus, the odds ratios (OR) closely approximated the hazard ratios produced by a Cox proportional hazard model.

We fit multiple models to obtain the best-fitting IPW estimates. We selected variables that were expected to be time-varying confounders for analysis. These variables included indicators of disease severity (e.g., cumulative Healthcare Cost and Utilization Project [HCUP] comorbidities, cumulative hospital admissions, and cognitive function [indications for depression, psychological and neurological diagnoses, diabetes, and congestive heart failure]). As documented in Table 8 , psychological and neurological diagnoses were the best predictors of antidementia drug exposure.

Table 8. Inverse probability weight (IPW) parameter estimates*
Factors Predicting Exposure to Antidementia Drugs Odds Ratio P-Value 95% CI
Cumulative hospital admission 1.05 0.54 0.90-1.22
Cumulative ED visits 0.93 0.38 0.78-1.10
Cumulative clinic visits 1.00 0.26 0.99-1.00
Number of HCUP diagnosis 1.04 0.16 0.99-1.09
Age 1.01 0.25 1.00-1.01
Male 0.87 0.24 0.68-1.10
Depression diagnosis 1.07 0.67 0.79-1.43
Psychological diagnosis 1.43 0.02 1.07-1.91
Neurological diagnosis 2.04 <0.00 1.60-2.60
Diabetes 0.80 0.14 0.59-1.08
Congestive heart failure 0.87 0.36 0.64-1.18

ED, emergency department; HCUP: Healthcare Cost and Utilization Project; CCS, Clinical Classification Software.

* The model included cubic splines for weeks as a means of incorporating time into the model; these parameters are not shown in the table.

The area under the receiver operator characteristic (ROC) curve provides an overall measure of classification accuracy, with the value of 1 representing perfect accuracy and 0.5 representing change. The area under the ROC curve of the IPW model was 0.61, meaning that the model could explain some factors associated with exposure. Nevertheless, there still appears to be unmeasured confounding in the treatment choice model.

The unadjusted and adjusted hazard ratios (HRs) and the IPW-adjusted OR comparing the rates of death for the exposed and the unexposed groups consistently showed a negative association with antidementia drug treatment (Table 9). For all ages combined, the values were as follows: unadjusted HR, 0.76 (95% CI, 0.60-0.96); adjusted HR, 0.70 (95% CI, 0.55-0.88); and OR, 0.76 (95% CI, 0.60-0.97), respectively.

Table 9. Unadjusted and adjusted analyses of death for groups exposed and not exposed to antidementia drugs
Target Outcome:Death Unadjusted Hazard Ratio P-Value 95% CI Adjusted Hazard Ratio P-Value 95% CI IPW Odds Ratio P-Value 95% CI
All ages 0.76 0.019 0.60-0.96 0.70 0.002 0.55-0.88 0.76 0.025 0.60-0.97
50-59 years 0.65 0.55 0.16-2.66 0.69 0.61 0.63-2.92 0.72 0.647 0.16-2.95
60-69 years 0.91 0.797 0.44-1.87 1.07 0.85 0.51-2.27 1.14 0.816 0.38-3.37
70-79 years 0.88 0.55 0.57-1.34 0.84 0.41 0.544-1.28 1.01 0.950 0.66-1.57
80-89 years 0.69 0.033 0.49-0.97 0.6 0.004 0.42-0.84 0.75 0.106 0.54-1.06
90 and over 0.52 0.07 0.25-1.05 0.57 0.13 0.084-1.18 0.50 0.100 0.21-1.14

We also performed stratified HR analyses, with age increments every 10 years, to explore further the relationship between drug exposure and death. Drug exposure appeared to have a negative association with death at all age strata, except for the strata of patients between 60 and 69 years of age. The IPW results were comparable with the adjusted HRs.

We used unadjusted and adjusted recurrent event survival analysis to evaluate expected and idiosyncratic clinical outcomes. In both unadjusted and adjusted analyses, we observed no statistically significant differences (Table 10 ). In adjusted analysis, a positive but nonsignificant association was found with antidementia drug treatment and gastrointestinal, hematological, and hepatic episodes; HRs were, respectively, 1.12 (95% CI, 0.91-138); 1.15 (95% CI, 0.91-1.45); and 1.01 (95% CI, 0.56-1.81). Antidementia drug exposure showed a negative but nonsignificant association with psychological episodes (HR, 0.88; 95% CI, 0.69-1.13).

Table 10. Unadjusted and adjusted analyses comparing clinical outcomes for groups exposed and not exposed to antidementia drugs
Outcome Unadjusted Adjusted
HR P<-value 95% CI HR P-Value 95% CI
Gastrointestinal episodes 0.89 0.239 0.73-1.08 1.12 0.284 0.91-1.36
Psychological episodes 0.82 0.124 0.64-1.06 0.88 0.322 0.69-1.13
Hematological episodes 1.01 0.925 0.80-1.69 1.15 0.244 0.91-1.45
Hepatic episodes 0.84 0.556 0.48-1.48 1.01 0.979 0.56-1.81

HR, hazard ratio.

Comparison of Events in Early and Late Stages of Treatment

We used random effects Poisson regression to evaluate the effect of early-stage drug exposure on clinical outcomes by comparing clinical events (episodes) within the first 4 weeks of exposure with clinical events occurring after the first 4 weeks of exposure. Table 11 presents the number of events and rates for the clinical outcomes and subcategories for the first 4 weeks of therapy ("early") and the time after the first 4 weeks ("late").

Table 11. Number of events and rates for clinical subcategories
CCS Target Outcome Early Late
Count Rates 95% CI Count Rates 95% CI
Gastrointestinal (GI) episodes 10 34.5 23.5-45.2 95 25.3 22.7, 27.9
17.1.6 Nausea and vomiting 0 0.0 10 2.7 1.9-3.6
17.1.7 Abdominal pain 6 20.6 12.2-29.0 46 12.5 10.7-14.4
9.12.3 Other and unspecified GI disorder 3 10.3 4.4-16.3 31 8.4 6.9-9.9
9.4.3 Gastritis and duodenitis 1 3.4 0.0-6.9 6 1.6 1.0-2.3
Mental episodes 9 30.9 20.6-41.2 57 15.5 13.5-17.6
5.7.1 Anxiety states 0 0.0 22 6.0 4.7-7.3
5.7.2 Personality disorders 2 6.9 2.0-11.7 1 0.3 0.0-0.5
5.7.3 Other anxiety, somatoform, dissociative, and personality disorder 1 3.4 0.0-6.9 8 2.2 1.4-2.9
5.9.1 Adjustment reaction 1 3.4 0.0-6.9 8 2.2 1.4-2.9
5.9.2 Depressive disorder, not elsewhere classified 5 17.2 9.5-24.9 15 4.1 3.0-5.1
5.9.3 Other and unspecified mental conditions 0 0.0 3 0.8 0.3-1.3
Hematological episodes 15 51.5 38.2-64.8 64 17.4 15.2-19.6
4.1 Anemia 12 41.2 29.3-53.1 55 15.0 12.9-17.0
4.2 Coagulation and hemorrhagic 1 3.4 0.0-6.9 7 1.9 1.2-2.6
4.3 Diseases of white blood cells 2 6.9 2.0-11.7 2 0.5 0.2-0.9
Hepatic episodes 2 6.9 2.0-11.7 10 2.7 1.9-3.6
9.8.2 Other liver disease 2 6.9 2.0-11.7 10 2.7 1.9-3.6

CCS, Clinical Classification Software identifier.

In the early versus late treatment analysis that included only exposed individuals (Table 12), the unadjusted analyses found significantly higher rates of hematological episodes in the first 4 weeks of antidementia drug exposure. In adjusted analyses, gastrointestinal and psychological episodes, which are the expected adverse events of antidementia drug treatment, showed higher but nonsignificant IRRs during the first 4 weeks of treatment (1.40 [95% CI, 7.2-2.71] and 2.00 [95% CI, 0.93-4.18], respectively). Among the idiosyncratic events, the IRR of hematological episodes was significantly higher during the first 4 weeks of antidementia drug treatment (2.86; 95% CI, 1.60-5.11). The incidence rate ratio (IRR) of hepatic episodes was also higher during the first 4 weeks of treatment; however, the difference was not statistically significant (2.67; 95% CI, 0.56-12.65). We also used a negative binomial regression for panel data to correct for potential overdispersion in the random effects Poisson regression models. We detected no differences in the model estimates.

Table 12. Exposed population: unadjusted and adjusted incident rate ratios comparing clinical events within the first 4 weeks of treatment with clinical events after the first 4 weeks of treatment
Target Outcome First 4 Weeks Unadjusted Adjusted
IRR P-Value 95% CI IRR P-Value 95% CI
Gastrointestinal episodes 1.26 0.485 0.65-2.44 1.40 0.323 0.72-2.71
Psychological episodes 1.91 0.077 0.93-3.92 2.00 0.067 0.93-b 4.18
Hematological episodes 3.00 <0.001 1.69-5.29 2.86 <0.001 1.60-5.11
Hepatic episodes 1.64 0.537 0.34-7.80 2.67 0.216 0.56-12.65

IRR, incident rate ratios; CI, confidence interval.

We also compared the incidence rates of hematological episodes within the first 4 weeks from the first dementia diagnosis, in the group that was not dispensed an antidementia drug during the 3 year observation period, to the rates in the time following the first 4 weeks to determine if factors other than antidementia drug treatment were associated with hematological episodes. The rate of hematological episodes was higher within the first 4 weeks of a dementia diagnosis than in the time following the first 4 weeks of a dementia diagnosis in both unadjusted and adjusted analyses, (1.52 [95% CI, 1.20-1.93] and 1.83 [95% CI, 1.44-2.33], respectively).

Because the IRR for hematological episodes was significantly higher within the first 4 weeks of therapy, we plotted the cumulative hazard for the treated and untreated groups for hematological episodes. As shown in Figure 5 , week 0 is the time of first exposure to the drug for the treated group, and the time of first dementia diagnosis for the never-treated group. The hazard rates for the treated group (dotted line) show a steep increase after the first and second week of initiating therapy, whereas the untreated group (solid line) shows a much smaller increase in cumulative hazard rate.

Figure 5. Cumulative hazard rates for hematological episodes in treated and never-treated groups

Graph showing Week on the x axis and Hazard on the y axis. Week is number from negative 5 to positive 7 in increments of 1. Hazard is numbered from 0 to .06 in increments of .02. In the group exposed to antidementia drug treatment, the hazard remains just above 0 from weeks minus 5 to -2, increases to just under .01 at week 0, and then doubles in week 1 to just above .02. By week 2, the hazard is above .035. It increases again in weeks 3 and 4. By week 5, the hazard has leveled off around .055. For the group not exposed to antidementia drug treatment, the hazard is higher than the treated group in weeks to 1. After week 1, the hazard is lower for the untreated group and rises at a slower rate, topping out around .05 by week 7.

- - - Group exposed to antidementia drug treatment

Group not exposed to antidementia drug treatment

0 = initiation of antidementia drug treatment OR first dementia diagnosis in the untreated group.

These ADE signals regarding hematological and hepatic episodes require further investigation to determine a true causal relationship. Future studies should focus on diagnosis (e.g., those from the International Classification of Diseases, version 9 [ICD-9]) and frequency of events among individuals to determine if a pattern emerges.

Our review of the rates of subcategories of hematological episodes found a much higher rate of anemia during the first 4 weeks of treatment. Further analysis is required to determine if this higher rate is causally associated with initiating antidementia drug treatment. The increase in the hazard rate of hematological episodes displayed in Figure 5 needs to be further evaluated to rule out confounding by clinic visit intensity. It is plausible that patients treated with antidementia drugs have more visits to the clinic than those not initiated on antidementia therapy and that the more frequent clinic visits are associated with an increased finding of hematological disorders, specifically anemia. Furthermore, the higher rate of liver disease needs to be further evaluated to determine whether a causal association exists. Further research should also drill down on these associations by evaluating the relationship between antidementia drug class and initial dose of therapy on these clinical outcomes.

Finally, because ICD-9 codes have limitations attributable to misclassification and lack of specificity, including concomitant drug therapy into the statistical models is expected to help remove residual confounding. Removing additional confounding will, in turn, better explain the relationship between antidementia drug exposure and early adverse clinical events.


As discussed in Chapter 1, the Medicare Prescription Drug, Improvement, and Modernization Act of 2003 (MMA), together with the establishment of the Part D outpatient pharmaceutical benefit for Medicare, ushered in a new era of issues in pharmaceutical treatments relating to quality of care, costs of care, and patient safety. These issues raise significant challenges for both policymaking and research, especially for the Centers for Medicare & Medicaid Services (CMS) and the Agency for Healthcare Research and Quality (AHRQ).

AHRQ, through the Section 1013 mandate in the MMA, has the responsibility for conducting and supporting research on the comparative effectiveness, appropriateness, and outcomes of health care services, particularly pharmaceuticals and devices. To accomplish this, AHRQ created its Effective Health Care program, which includes the Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) network, to carry out practical, quick turnaround studies, focusing especially on studies using administrative, clinical, and pharmacoepidemiologic databases. Such studies can target clinical questions, methods development, or both. Of particular interest initially are the benefits and risks of pharmaceuticals. With respect to risks or harms of drugs, methods to identify and characterize adverse drug events (ADEs) are critical.

The primary purpose of our work was to develop a toolbox of methods for using medical administrative claims data to support pharmacovigilance efforts. This toolbox is intended to serve as a manual for researchers and analysts who track pharmacy usage and monitor drug safety. The procedures within this toolbox are designed to facilitate exploration of the relationship between drug exposure and adverse events, within the pharmacoepidemiologic framework of a cohort study design.

Because the full range of data (e.g., hospitalizations, emergency department visits, physician and clinic visits and other ambulatory services, death certificates, and outpatient pharmaceuticals) are currently unavailable for Medicare, we carried out our methods development using Medicaid data available from the state of Utah covering the period 2003 through 2005. Our work anticipates the time when databases combining Medicare Parts A, B, and D will become available.

We applied our framework of methods to an illustrative example. We were charged to examine a clinical condition important for elderly patients-dementia, including Alzheimer's disease-and the drugs in two classes used to manage this disorder. This condition was selected because it is an important cause of morbidity and a contributor to health care costs. The drugs used to treat dementia are relatively new and their efficacy (as shown in clinical trials) is modest. Issues that have been raised about trade-offs between their benefits and risks make it particularly important to study adverse outcomes associated with these drugs.

The remainder of this chapter presents our main conclusions, outlines the challenges and limitations of this work, and discusses our work's significance and implications. We present conclusions in three sections: methodological approach, lessons learned, and clinical findings. Chapter 5 recaps these points more directly for policymakers and clinicians.


Methodological Approach

Defining conditions and pharmaceuticals . Our first step was to define the medications of interest and the target outcomes. We included both acetylcholinesterase (AChE) inhibitors (including donzepezil hydrochloride, rivastigmine tartrate, and galantamine hydrobromide) and N-methyl-D-aspartate (NMDA) agents (memantine). We selected three categories of clinical outcomes: death, expected adverse reactions, and idiosyncratic reactions. Death was chosen as an all-encompassing endpoint. The expected adverse reactions comprised gastrointestinal and psychiatric conditions that have been reported in randomized clinical trials. The idiosyncratic reactions were hepatic and hematologic diagnoses, which we selected as severe reactions that involve major organ systems. Although these two reactions have not been reported in the literature, we believe that they help to illustrate the hypothesis-generating aspect of this study.

Understanding data sources. Our second step was to examine the structure of the Utah Medicaid database and conceptualize the process for translating these data into tables that were usable for statistical analysis. To this end, we developed data models for intermediate and analytic tables.

Drug course table. We concentrated particular attention on the drug course table to define as precisely as possible the time intervals during which antidementia drugs were used. We decided that a table in which rows of data represented discrete intervals of person-time experience would allow the most analytic flexibility and efficiency.

This type of "long" table, which comprises multiple rows per patient, is known as a panel, longitudinal, or counting process format. This data structure is advantageous because it supports the entire range of statistical analyses that are of interest for pharmacoepidemiologic studies. It supports the analysis of treatment exposure using inverse probability weighting (IPW) because patients are captured from cohort inception, meaning their pre-exposure history is recorded and can be used to model the probability of receiving drug therapy at each discrete time interval. The counting process format also can easily accommodate several other special features in a study cohort; these features include multiple events of different type (different target outcomes), multiple events of the same type (recurrent events), and time-dependent covariates.

Our analysis of antidementia drugs exploited of all these features. For example, we used time-varying covariates to model factors that influence treatment exposure using IPW, multiple outcomes that included persistence on drug therapy, death and specific clinical events, and recurrent ADEs for expected and idiosyncratic reactions to antidementia drug therapy.

Analytic table. We characterized drug exposure at the day-by-day level but then constructed the final analytic table using 1-week discrete time intervals. This interval maximizes efficiency without omitting clinically important changes in patient covariate status. We utilized SAS to develop the analytic tables and we used Stata (StataCorp, 2005, Stata Statistical Software: Release 9, College Station, TX: StataCorp LP) as the primary software for statistical work because it has many features to facilitate survival analysis when data are in the multiple-row-per-patient format. However, either program as well as other statistical programs can accommodate data in this format.

We performed multiple intermediate steps to create the final analytic table. Establishing patient enrollment and characterizing drug exposure was the first and most important step because our ability to quantify drug exposure and record subsequent outcomes depends on whether the patient was under observation (i.e., the patient's monthly enrollment status). Using the longitudinal data structure described above, we determined whether a person was an incident or a nonincident (i.e., new or established) user for each course of antidementia drug therapy.

Identifying courses of drug therapy. This approach allowed us to establish drug therapy courses by determining medication start and end dates within the context of Medicaid enrollment and eligibility. We defined a new course of drug therapy as starting an antidementia medication after a 60-day drug-free period while continuously enrolled in Medicaid during that 60-day period (i.e., before the date that the first antidementia drug was dispensed).

The ability to identify a new course of therapy is crucial for analysis of drug effects because a mixed analysis of incident and nonincident users may wash out the effects of drug therapy that occur either early or late in treatment. Specifically, nonincident users can introduce two types of bias: (1) underascertainment of events that occur early in therapy and (2) inability to adjust adequately for disease risk factors that may be altered by the study drugs.22 We thus developed robust programs for characterizing exposure for both types of users; others who need to create accurate pictures of exposure when conducting ADE (or related) research can adopt or adapt these programs.

When examining exposure to antidementia drugs in the Utah Medicaid population, we found that of the 1,844 drug therapy courses, 497 discontinued therapy (i.e., patients stopped medications while being continuously enrolled in Medicaid and they did not die). The rate of discontinuation in incident courses was 1.4 (95% confidence interval [CI], 1.16-1.68) times higher than the rate for nonincident courses. Incident courses showed a steep drop in use at Week 5, suggesting that they were discontinued after one 30-day prescription. The nonincident courses also experienced a similar but attenuated drop in persistence at Week 5, which suggests a misclassification problem.

To be classified as incident users in the ADE analysis, patients had to be observed for at least 60 days before they first filled an antidementia prescription. That is, they had to be eligible and enrolled in Medicaid for at least 60 days before that first prescription fill. Many of the patients observed to start medication within the first 60 days of the study or within 60 days of enrollment may have actually been true incident users but were misclassified as nonincident courses.

Ray described a new user design as a way of removing some biases associated with studying drug users over a mixture of different times since first exposure to the drug under evaluation.22 In this design, investigators follow each user from the start of his or her current course of drug therapy, after a minimum period of nonuse. Ray also described a more stringent wash-out period to describe a type of new user study known as an inception cohort study; this design is intended to establish the first-ever exposure to the medication of interest.

Identifying a true inception cohort is difficult when patients often receive trials of multiple drugs and multiple courses. Nevertheless, a longer drug-free period would help avoid bias arising from depletion of susceptibles and compliant users. It may also help avoid problems with adjustment for baseline covariates that could plausibly be correlates of drug-exposure itself.34 A longer period may improve the prediction of which patients receive therapy (using IPW methods, for instance).

One reason that the area under the receiver operating characteristics (ROC) curve in our analyses (0.61) may be disappointing is that 60 days is not sufficient time to characterize other health care utilization that predicts new use. The lack of adequate variables to control confounding, such as concomitant prescription medication use and prescriber information, is also a reason for the moderate performance of the model.

Again, we want to reinforce the point that our analysis was not designed with the rigor and quality of data to support a c ausal analysis between drug exposure and the ADEs evaluated. Rather, our design can be used as an opportunistic approach to identify potential ADEs that may not be recognized by health care providers. Follow-up studies that include clinical data (e.g., medical records and laboratory test results) are needed to design causal studies. For designing causal models, a more theoretical and structured approach to variable selection is also recommended; to represent the known and theoretical confounders, this might include developing so-called directed acyclic graphs,35 which explicate the proposed protective and causative effects of one variable or another. Our approach is intended to identify potential signals for further followup.

Our SAS program is designed to support different criteria for defining incident courses and can be modified easily to better identify an inception cohort by requiring a longer drug-free period. In the future, researchers should evaluate the effects of different definitions of incident antidementia drug users on the persistence and adverse event hazard rate ratios.

Tracking drug use and other health conditions. A feature of our analysis table, which is produced from the drug course table, is that it permits tracking of antidementia drug use and comorbidity status for any discrete interval of time (e.g., on a weekly basis). This framework allows us to update drug exposure and covariate status as changes occur. We designed our database to import data from other tables into the drug course table, such as diagnosis or procedure codes, to help characterize the underlying health status of each individual (e.g., co-existing conditions). We defined comorbidities and outcomes with codes based on The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), Current Procedural Terminology (CPT) categories, and Healthcare Cost and Utilization Project (HCUP) Clinical Classification System codes; the last is a classification software that groups ICD-9-CM codes for easier analysis. The comorbidity and outcome data came from medical claims and were brought into the analysis table according to the time interval in which the events occurred. Capturing these time-varying covariates allows for the adjustment of disease risk factors that study drugs may alter.

Database Development: Lessons Learned

The use of Medicaid data to support detection of ADEs presents several significant challenges. We discuss here some of the more problematic challenges and offer some recommendations, based on our experience, for future investigators.

Understanding the data source. One important condition in developing a new database to do pharmacoepidemiology studies of this sort is to understand fully the original data source. Utah Medicaid data and death certificate data are stored in a complicated warehouse. Relationships among different data tables, definitions, and labels of data fields are not always clearly documented (or documented at all). For example, we had four different client identifications. With careful consultation from Medicaid data experts, we used each of the identifiers to link client records according to the source of records and purpose of linkage.

The Utah Medicaid program updates its data warehouse structure periodically, and this poses special challenges for standardizing longitudinal data over the years. We learned that the method used for downloading the 2003 pharmacy claims differed from the method used for later years. The Utah Department of Health spent considerable resources to prepare and reprepare the raw data files and intermediate tables for the researchers to produce analysis tables for this study.

Other state Medicaid data warehouses may well face similar challenges. We recommend that researchers who are new users of a state's Medicaid data obtain adequate technical support from the relevant Medicaid program(s) and share their data integrity analysis with their data suppliers.

Ensuring data integrity. One requirement is to perform data quality checks meticulously to identify obvious errors that might compromise study validity. We followed a standard template to evaluate data integrity.36 This useful and time-saving step identified a discrepancy in the way we had extracted data across different years, which we then quickly corrected.

Another concern regarding the use of claims data for ADE analysis is the validity and quality of the data. Claims data are collected for payment and reimbursement purposes, not for research. Therefore, the information, such as diagnosis, may be a reflection of payment policy or reimbursement requirements regarding the claim. Researchers should be familiar with any potential anomalies in the data quality. Future research should examine the validity of measures developed from claims data.

Creating intermediate tables . Another challenge was the need to construct intermediate tables. The analysis table, with an observation time span of 1 week, was produced from a day-by-day drug therapy course table. We found that constructing a one-observation-per-day table with a large cohort can be difficult if the steps are not well planned and not centered on Medicaid enrollment status.

We first produced two primary drug exposure tables, one for each drug class. We also produced secondary drug exposure tables. We designed these to capture duplicate dispensing from drugs within the same drug therapy class (i.e., AChE inhibitors or NMDA receptor antagonists) (see the data dictionary in Appendix A for details). We designed the primary exposure tables to account for day-by-day observations of drug exposure. We linked the two primary drug exposure tables to the enrollment table and the death certificate table to form the basis of the day-by-day drug course table.

When linked, these three tables (enrollment, drug exposure, and death), form the day-by-day drug course table, which is the foundation for determining if a person is an incident or a nonincident user for each course of therapy. We used Medicaid eligibility and prescription fills to estimate drug exposure start dates, end dates, enrollment status, and death.

We attempted multiple ways of producing the day-by-day drug course table. Based on this experience, we recommend working with the exposed cohort and fully characterizing their drug exposure history before appending the never-exposed cohort and including comorbidity and concomitant therapies. We make this recommendation because the day-by-day drug course table can become large; many sorts and manipulations are required to fully define drug exposure. Working exclusively with the exposed population before linking to the never-exposed population and medical claims will keep the data tables to a more manageable size.

Once drug exposure is fully defined, we recommend taking snapshots of the specific discrete time interval required for analysis. For this project, we chose 1-week intervals. The never-exposed intermediate table should be compressed to the same discrete time intervals before being appended to the exposed drug course table. Once this is complete, comorbidities and concomitant therapies can be imported into the drug course table to form the analysis table.

Controlling for confounding. Marginal structural models (MSM) with IPW are a relatively new class of statistical models developed by Robins et al. for estimating, from observational data, the causal effect of a time-dependent exposure in the presence of time-dependent covariates that may be simultaneously confounders and intermediate variables.31 Our study included both time-varying exposures and time-varying covariates that were potential confounders and intermediate variables. Health status (e.g., number of comorbidities, hospitalizations) can potentially influence both the decision to treat dementia symptoms and clinical outcomes such as death or subsequent clinical events. Furthermore, health status can be influenced by antidementia drug exposure.

Our experience with IPW as a method for controlling for confounding indicates that it is feasible to apply IPW to large administrative data sets. IPW is based on a distinctly different approach to control for confounding than the usual risk-adjustment methods. Similar to propensity scores, IPW starts with a model to identify factors that influence exposure or treatment. IPW is particularly useful when investigators want to examine multiple outcomes. Unlike traditional regression methods, in which the focus of risk adjustment is on separate models of each target outcome, in IPW approaches the focus is on modeling the factors that are associated with exposure. The exposure model remains the same for each outcome, but the censoring weights change according to the outcome being modeled.

Clinical Findings

Death. We found a negative association between antidementia drug treatment and all-cause mortality among patients who had a dementia diagnosis. The hazard ratio estimates were similar regardless of whether we used IPW or multivariable regression to account for confounding. Our interpretation of this finding is that it is most likely an indication of the "healthy user" phenomenon rather than a true beneficial impact of antidementia drug therapy. The healthy user effect occurs because patients with more complex disease and increased risk for death are less likely to be treated with the medication of interest. Adequately controlling for confounding using administrative data is particularly difficult in studies involving overall mortality.37

Expected adverse events. In the analysis that compared the incidence rates during the first 4 weeks of treatment with the rates after the first 4 weeks of treatment, the incident rate ratios (IRRs) for gastrointestinal and psychological episodes were greater than 1.0; however, the IRRs did not reach significance. The upper bound of the 95% CI, which provides the estimate for the possible ceiling of the ADE, was 2.71 and 4.18 for gastrointestinal and psychological episodes, respectively. These findings are consistent with the experience of geriatric practitioners on the team and product literature;38 reactions typically occur in the first few days to weeks and then attenuate or disappear with continued use.

Also, the analysis that compared the hazards of the exposed and the unexposed groups did not reveal a significant association between expected reactions and antidementia drugs. The upper bounds of the 95% CIs of the hazard ratio for gastrointestinal and psychological episodes were 1.36 and 1.13, respectively. This lack of a statistically significant association may be explained by either the transience of these effects or as a risk that declines over time. Confounding by indication may also explain this lack of effect. For example, physicians may avoid prescribing antidementia drugs to patients with loose stools or bloating.

Idiosyncratic adverse events. Our analysis of idiosyncratic reactions yielded expected and unexpected results. No known pharmacologic or empirical reasons exist for recently marketed antidementia drugs to cause hematological or hepatic toxicity. As expected, we did not find any association between these toxicities and drug exposure in Cox modeling.

Our analysis of early versus late exposure within the treated cohort, however, demonstrated a highly significant early risk for episodes of care for hematological diagnoses (incidence rate ratio [IRR] = 2.86; 95% CI, 1.6-5.11).

We also compared the incidence rates of hematological episodes within the first 4 weeks from the first dementia diagnosis in the never-exposed group with the rate in the time following the first 4 weeks to determine whether factors other than antidementia drug treatment were associated with hematological episodes. The rate of hematological episodes was higher within the first 4 weeks of a dementia diagnosis than in the time following the first 4 weeks of a dementia diagnosis, with an IRR of 1.83 (95% CI, 1.44-2.33); by contrast, the IRR in the treatment group was 2.86 (95% CI, 1.6-5.11).

Even though receiving a dementia diagnosis was associated with an increased rate of hematological episodes, the magnitude of the effect was lower than the treatment group. This finding demonstrates the utility of our approach in generating signals of adverse events, but it should be used only for the purposes of generating hypotheses for further evaluation. Additional investigation is warranted to determine whether antidementia drugs have a causal effect on hematological episodes.

Influence of heterogeneity in ADE diagnoses. The outcomes used for our analyses represented relatively large and heterogeneous groups of diagnoses. For example, gastrointestinal events included both constipation and diarrhea, and hematological events included both agranulocytosis and anemia. The usual effect of heterogeneity is to dilute the measured association between a specific syndrome and the exposure.

This diversity suggests that analysts should consider some refinements to our methods. For the targets of expected gastrointestinal ADEs, for example, selecting code sets specific to nausea, vomiting, diarrhea, and stomach pain would be the next step in further evaluating the adverse effects of antidementia drugs. Researchers will have to evaluate the set of codes for the drugs they examine and determine the level of granularity needed to understand the relationship between drug exposure and adverse drug therapy outcomes.

Likewise, for hematological ADEs, the next step would be to evaluate common diagnoses such as anemia separately from less common and potentially more serious diagnoses of agranulocytosis and thrombocytopenia. We believe that this distinction may prove important for other diagnoses and drug associations. To characterize better the relationship between the antidementia drugs and hematological events, investigators should drill down on the medications to determine whether one class or brand accounts for a disproportionate amount of increased IRR. Likewise, researchers should determine if the ADEs display a dose-response relationship. That is, future investigations should examine whether the events depend, in some fashion, on the average daily dose or cumulative dose of the medication.

Our analysis may also suffer from heterogeneity of the drug exposure itself, because it includes three examples of AChE inhibitors and one NMDA modulator. Future researchers should study each drug separately to determine if specific products are the cause of the increased rate of events.

More numerous and more specific analyses, however, engender disadvantages that relate to multiple comparisons and low numbers of events. If we focus on more granular outcomes by examining specific drugs and/or specific adverse event codes, we may find that some or all medications under study have a larger effect on IRRs than less disaggregated outcome measures. However, dividing outcomes into smaller categories produces a multiple comparisons problem and increases the likelihood of false alarms. Smaller numbers may also make it more challenging to find statistical significance.

All investigators face these issues when performing ADE analyses of this type. We encourage future research to examine the methods described in this study for identifying possible ADEs and carrying out surveillance activities of this type, in this case using dementia and antidementia drugs as the case in point. Researchers should validate findings with more rigorous designs and clinically rich data.


Database Limitations

The Utah Data Warehouse provides a rich source of data, but we acknowledge several limitations to the data. First, we did not include managed care encounter data in the study. The Utah Medicaid Program is in the early stages of testing the encounter data that contracted health plans submit, and the medical claims data are not ready for research uses. Therefore, our study does not measure outcomes for managed care enrollees.

Second, we included only paid claims. The Data Warehouse may include multiple versions of a claim for a medical encounter or prescription; we extracted only the final (paid) claim records for the analytic database. Although this is a routine selection criterion for studying Medicaid claims, we may have undercounted the number of prescriptions by rejecting some unpaid claims. Furthermore, Medicaid enrollees move in and out of the program frequently. Our study did not include the prescriptions purchased during disenrollment periods.

Third, the short time span for measuring outcomes may have led to underestimation of ADEs among the study population. However, this limitation might have affected only those who used antidementia drugs in late 2005, toward the end of the study period.

Fourth, using a deterministic method to link death and Medicaid eligibility records may miss some cases of persons who had used antidementia drugs and are now deceased. Another method for linkage, probabilistic linking, may increase the number of linked cases that could enhance the power of the study. This approach, however, adds another source of variability in the data that could decrease the precision of estimates.

Fifth, enrollment in Medicaid is monthly. This factor produces multiple methodological challenges. For example, when identifying incident medication users, we had to consider the use of the medication since the start of the study period and each individual's enrollment status. Monthly enrollment also produces gaps in observations during which important diagnoses and targeted outcomes may be missed.

Methodological Limitations

This study illustrates methods for ADE detection in claims databases, but several intrinsic limitations should be noted. First, the design is observational, not experimental, in nature. A drawback of any observational study is the potential for confounding due to unmeasured variables, such as the selection of treatment. For example, in this illustrative study, the choice of prescribing the drug is not random and is probably related to unmeasured characteristics, such as health status of the patient; this factor may confound our results. Methods such as multivariable regression and IPW can remove only confounding related to measured factors. Confounding related to unmeasured factors is the likely explanation for the observed negative association between antidementia drug use and mortality.

Including concomitant and other drug therapy would have helped to remove confounding and better identify clinical outcomes. For example, use of atypical antipsychotics, drugs used to treat agitation and aggression in severely demented patients, is associated with an increased rate of cognitive and psychological disturbance39 and death.40

Statistical models that rely on instrumental variables have the potential to account for confounding from unmeasured variables. An instrumental variable is a factor that influences the likelihood of exposure but is not associated with the outcome except through its effect on exposure. For a study such as ours, a good candidate instrumental variable may be the prescribing physician's preference for whether to treat dementia with antidementia drug therapy; we also could operationalize the instrumental variable as the choice of antidementia agents.

We were unable to use prescriber as an instrumental variable because prescriber identifiers were unavailable in the pharmacy claims. Because using a weak instrument can cause more issues with the analysis than desirable,41 we opted not to pursue this method. We advise that future studies on ADEs in observational claims should examine the effects of unobserved confounding through the use of an appropriate instrument. These studies should include sensitivity analysis to examine the degree of confounding that might be attributed to unobserved variables.

Second, incorporation of concomitant medications into our analyses likely would have improved our ability to adjust for confounding by indication. Time and resources constraints precluded our taking this step in this initial project. Before using concomitant medications, we first needed to update the National Drug Classification (NDC) codes for therapeutic and risk-score calculations. We are now adding the concomitant prescription information to our database to use in future analyses. We expect that inclusion of concomitant medication data will improve the performance of IPW and help remove at least a portion of the residual confounding. Including this information also will permit evaluation of potential drug-drug interactions.

Third, another weakness of our specific project was the relatively small sample of subjects with incident use of antidementia drugs. This markedly limited our power to detect rare adverse events. Future studies should combine Medicaid claims from several states or employ simulation methods (or both) to deal with the challenges presented by small samples and rare events. In the near future, the samples available for this disease in Medicare claims should be large enough to overcome small sample size problems.

We explored the sample size requirements needed to detect significant differences with type I error of 5 percent with 80 percent power and 10 percent exposed with 10 percent censoring. To do this, we calculated the sample size while varying incidence rate and hazard ratio. Table 13 provides the sample size requirements to detect combinations of incidence rates and hazard ratios. As the incidence rate or the hazard ratio increases (or both increase), the sample size requirements become smaller.

Table 13. Sample size requirements based on hazard ratio and incidence rate
Hazard Ratio Incidence Rate
20% 10% 5% 1%
1.5 1,283 2,513 4,976 24,693
2 394 759 1,491 7,353
2.5 209 397 775 3,805
3 130 258 501 2,451

Fourth, the outcomes used in this analysis have not been validated. Pharmacoepidemiology studies that rely on claims data often are forced to develop outcomes that are close to a validated measure but are not the actual measure. This problem may arise because of lack of information in the claim required for the validated measure, such as laboratory values, or the lack of any valid measures for the outcome of interest. The endpoints need to be validated against information documented in the patient's medical record.

Finally, although our sample on the antidementia drugs is small, the comparison group of those not exposed to the drugs is not small. In most situations, researchers will either sample data or restrict the analysis to a specific subpopulation of individuals. By restricting our analysis of mortality to patients who had a dementia diagnosis, we reduced confounding and improved the ease of analysis by condensing the size of the data set.

When sampling, researchers need to be cognizant of the rate of censoring, factors related to time in study, and time-varying confounders. As researchers use larger databases, such as Medicare pharmacy claims, the need for appropriate sampling methodology will become even more critical. Such methods must be explored in future research.

Significance and Implications

Previous ADE studies have often relied on data from federal reporting agencies, such as MedWatch at the U.S. Food and Drug Administration and the U.S. Pharmacopeia's MEDMARX® systems, and on information from randomized controlled trials. Our study presents a methodological framework for researchers to use in working with observational data, specifically from pharmacoepidemiologic databases. The methods outlined in this report, and the stepwise approach (i.e., data integrity, exposure and persistence, and ADE analysis) are ones that numerous research teams, in DEcIDE Centers as well as other groups such as the Centers for Education and Research in Therapeutics (also supported by AHRQ), can readily adopt or adapt.

These studies would generate hypotheses for future research regarding the adverse events associated with specific drugs or drug classes. Data available from Medicaid claims, employer claims, and (eventually) Medicare claims can now be used to examine specific drug classes and agents within those drug classes for ADEs. The ADE framework of initially examining mortality, known events from the clinical trials, and then potentially severe but unobserved events (as, in our study, hepatological and hematological events) will further our understanding of drug safety. The advantage of the framework and method outlined in this report is that they allow claims databases to identify novel signals for previously unrecognized ADEs, as well as to examine the number of previously identified ADEs.

This framework has additional strengths beyond the study of ADEs. Currently, most of the state Medicaid drug utilization review programs focus on high-utilization cases and brand name drug prescriptions. Our study provides a new approach for drug safety review methods. State Medicaid programs can apply the method and data structure to explore drug classes of interest.

Translation of Findings


This study developed an analytic framework including technical and statistical methods for examining adverse drug events (ADEs) that might occur among new or established users of pharmaceuticals; the illustrative case was drugs for dementia, including Alzheimer's disease. This work relied on Medicaid hospital, outpatient, emergency department, and pharmacy claims and death certificate records from the state of Utah for the pilot study. This study provides a template for ADE research for other priority conditions and drug classes, other populations, and other claims databases, which eventually should include Medicare Parts A, B, and D.

Our approach involved numerous separate stages, which we believe are needed for all similar research involving these types of pharmacoepidemiologic data. In an initial stage, steps included linking separate databases and creating analytic files. We also tested the integrity of the data.

Subsequent stages involved specifying diagnoses of interest. In our case, this meant specifying the diagnoses and types of services that would qualify as reflecting previously identified (expected) and rare (idiosyncratic and unexpected) adverse events. We created complex models and programs for classifying patients according to their exposure status (or lack thereof) to the pharmaceuticals in question and for determining their level of persistence on drugs over time.

Only when all these steps have been completed could we move to a final stage-namely, studying ADEs associated with the example drug class of antidementia drugs as the illustrative test of our methods. We draw particular attention to this flow of steps as guidance for DEcIDE Centers, other research groups conducting similar studies, and federal agencies responsible for assuring the quality and safety of pharmaceuticals and for supporting studies of theses issues in the future.

Clinicians and Providers

Clinicians can benefit from the analysis framework and approach for two main reasons. First, our design provides a longitudinal record of actual patient adherence patterns, which can be of use in improving the quality of care delivered. Second, our approach can help create a long-term (evolutionary) evidence base about the expected occurrence and timing of different ADEs.

By being familiar with persistence data, clinicians can target confirmation of adherence to medication regimens by their patients and probe for ADEs at points when patients are most vulnerable to discontinuation, regardless of whether they appear to be experiencing an ADE. Moreover, persistence data provide a metric by which clinicians and others can understand the effectiveness or net benefits of such drugs in the context of what is known about their efficacy. For instance, in the case of antidementia medications reported in this study, many patients are using these medications for longer periods than can be well supported by existing, trial-based evidence (i.e., efficacy information). For clinicians and their professional societies, our findings regarding length of time on antidementia drugs, albeit preliminary, will assist in the development of clinical practice guidelines for the appropriate use of these drugs.


In the context of this study, policymakers include several stakeholder groups: officials responsible for large public-sector health programs (i.e., Medicare and Medicaid), persons responsible for supporting research into the quality and costs of health care; and representatives of payers, employers, insurers, health plans, and integrated delivery systems of various sorts.

With respect to research, the framework we presented for examining ADEs-the general strategy and specific programming necessary to create an analytic database from enrollment, claims files, and death certificate records-will be more powerful when it is replicated in larger populations and across other disorders and therapeutic categories. Explicit explanations or programming code involved in these steps are infrequently included in published literature. Appendices in this report will become part of a toolkit that the Agency for Healthcare Research and Quality can disseminate; the toolkit will be a unique resource to future researchers both inside and outside government agencies.

Additionally, the framework provides a systematic enquiry into ADE detection and surveillance. Specifically, we offer a methodology, examining mortality, known ADEs, and suspected, serious ADEs, that allows researchers to generate hypotheses for future studies. This framework also provides a structure for ADE research that will ease interpretation of results for numerous different audiences.

Our statistical methods do improve on the reporting of commonly used statistical adjustments in the literature. Specific adjustment for treatment selection is crucial for these types of studies. For all issues of safety, effectiveness, and comparative effectiveness, these are significant advances in the methodology available to researchers and policymakers who need to make decisions about coverage, reimbursement, formularies, and quality improvement programs.

We also emphasize the utility of this work for all state Medicaid agencies. The Utah Medicaid Program (but also others) can use the findings to educate health care providers about the appropriate prescribing and discontinuation of antidementia drugs. Such education could lead to decreases in the number of inappropriate antidementia prescriptions and improved quality of treatment.

Similarly, state departments of health can adopt this methodology to develop statewide pharmacoepidemiologic databases and indicators of prescription drug use-at least for publicly supported programs-and to improve public health surveillance and reporting systems. The Utah Department of Health, for instance, proposes to incorporate our methodology into the statewide prescription utilization indicators, especially the concepts (and operationalized definitions) of the persistence measure and drug therapy course.

Health Plans, Payers, and Self-Funded Employers

Health plans, health insurance companies, and self-funded employers have claims databases similar to those for Medicaid. As we suggested above with respect to Medicaid programs, health plans, insurers, and employers also can adopt the publicly available methodological toolbox developed by this study to assess their patients' ADEs. They can further modify the methodology to analyze other types of prescriptions and diseases.


This study was supported by Contract HHSA29020050036I from the Agency for Healthcare Research and Quality (AHRQ), Task No. 1. We acknowledge the continuing support of Scott R. Smith, R.Ph., Ph.D., Director of AHRQ's Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) Program and the AHRQ Task Order Officer for this project. We are also grateful for the support of Lia Snyder, M.P.H., AHRQ's Effective Health Care (EHC) Program Manager. We extend our appreciation to Ann Gruber-Baldini, Ph.D., and Ilene Zuckerman, Pharm.D., Ph.D., at the University of Maryland at Baltimore for their willingness to share literature review results. We also thank Norman Thurston, Ph.D., and Tim Morley, R.Ph., for their consultation and the Utah Department of Health Division of Health Care Finance for their contributions on data.

The investigators deeply appreciate the considerable support, commitment, and contributions of the DEcIDE team staff at RTI International. We also express our gratitude to Loraine Monroe, DEcIDE word processing specialist and Melissa Fisch, B.A., editor.


1. Mukherjee D, Nissen SE, Topol EJ. Risk of cardiovascular events associated with selective COX-2 inhibitors. J Am Med Assoc. 2001 Aug 22-29;286(8):954-9.

2. Sanghi S, MacLaughlin EJ, Jewell CW, Chaffer S, Naus PJ, Watson LE, et al. Cyclooxygenase-2 inhibitors: a painful lesson. Cardiovasc Hematol Disord Drug Targets. 2006 Jun;6(2):85-100.

3. Wagner TH, Cruz AM, Chadwick GL. Economies of scale in institutional review boards. Med Care. 2004;42(8):817-23.

4. Smith BD, Smith GL, Haffty BG. Postmastectomy radiation and mortality in women with T1-2 node-positive breast cancer. J Clin Oncol. 2005;23(7):1409-19.

5. Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometricka. 1983;70:41-55.

6. Field TS, Gilman BH, Subramanian S, Fuller JC, Bates DW, Gurwitz JH. The costs associated with adverse drug events among older adults in the ambulatory setting. Med Care. 2005;43(12):1171-6.

7. Pezalla E. Preventing adverse drug reactions in the general population. Manag Care Interface. 2005;18(10):49-52.

8. Vray M, Hamelin B, Jaillon P. The respective roles of controlled clinical trials and cohort monitoring studies in the pre- and postmarketing assessment of drugs. Therapie. 2005;60(4):339-44, 45-9.

9. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342(25):1887-92.

10. Hernan MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006 Jul;60(7):578-86.

11. Garner SE, Fidan DD, Frankish RR, Judd MG, Towheed TE, Wells G, et al. Rofecoxib for rheumatoid arthritis. Cochrane Database Syst Rev. 2005(1):CD003685.

12. Greenland S, Robins JM. Confounding and misclassification. Am J Epidemiol. 1985 Sep;122(3):495-506.

13. Walker AM. Confounding by indication. Epidemiology. 1996;7:335-6.

14. Greenland S, Morgenstern H. Confounding in health research. Annu Rev Public Health. 2001;22:189-212.

15. Corrao G, Botteri E, Bagnardi V, Zambon A, Carobbio A, Falcone C, et al. Generating signals of drug-adverse effects from prescription databases and application to the risk of arrhythmia associated with antibacterials. Pharmacoepidemiol Drug Saf. 2005 Jan;14(1):31-40.

16. Rubin DB. Using multivariate matched sampling and regression adjustment to control bias in observational studies. J Am Med Assoc. 1979;74(366):318-28.

17. Hansen RA, Gartlehner G, Kaufer D, Lohr K, Carey T. Drug class review of Alzheimer's drugs. Final Report Update 1. 2006.

18. Gruber-Baldini AL, Zuckerman I, Du D, Fang G, Miller R, Stuart B, et al. Methods for studying dementia treatment and outcomes in observational databases. University of Maryland, Baltimore, Maryland: Literature review prepared for the Agency for Healthcare Research and Quality under Contract No. HHSA290200500391; 2005.

19. Birks J. Cholinesterase inhibitors for Alzheimer's disease. Cochrane Database Syst Rev. 2006(1):CD005593.

20. Utah Department of Health. Utah Health Status Update: Characterizing the Utah Medicaid Population. 2006 [cited; Available from:

21. Suissa S. Novel approaches to pharmacoepidemiology study design and statistical analysis. In: Strom B, editor. Pharmacoepidemiology. 4th ed. Philadelphia: Wiley & Sons, Lt.; 2005. p. 811-29.

22. Ray WA. Evaluating medication effects outside of clinical trials: new-user designs. Am J Epidemiol. 2003 Nov 1;158(9):915-20.

23. van Wijk BL, Klungel OH, Heerdink ER, de Boer A. Refill persistence with chronic medication assessed from a pharmacy database was influenced by method of calculation. J Clin Epidemiol. 2005;59(1):11-7.

24. Bennett CL, Nebeker JR, Lyons EA, Samore MH, Feldman MD, McKoy JM, et al. The Research on Adverse Drug Events and Reports (RADAR) project. J Am Med Assoc. 2005 May 4;293(17):2131-40.

25. Ritchie CW, Ames D, Clayton T, Lai R. Metaanalysis of randomized trials of the efficacy and safety of donepezil, galantamine, and rivastigmine for the treatment of Alzheimer disease. Am J Geriatr Psychiatry. 2004 Jul-Aug;12(4):358-69.

26. Blass JP, Cyrus PA, Bieber F, Gulanski B. Randomized, double-blind, placebo-controlled, multicenter study to evaluate the safety and tolerability of metrifonate in patients with probable Alzheimer disease. The Metrifonate Study Group. Alzheimer Dis Assoc Disord. 2000 Jan-Mar;14(1):39-45.

27. Graves T, Hanlon JT, Schmader KE, Landsman PB, Samsa GP, Pieper CF, et al. Adverse events after discontinuing medications in elderly outpatients. Arch Intern Med. 1997 Oct 27;157(19):2205-10.

28. Schneeweiss S, Sturmer T, Maclure M. Case-crossover and case-time-control designs as alternatives in pharmacoepidemiologic research. Pharmacoepidemiol Drug Saf. 1997 Oct;6 Suppl 3:S51-9.

29. Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol. 1991 Jan 15;133(2):144-53.

30. Lefebvre G, Angers JF, Blais L. Estimation of time-dependent rate ratios in case-control studies: comparison of two approaches for exposure assessment. Pharmacoepidemiol Drug Saf. 2006 May;15(5):304-16.

31. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000 Sep;11(5):550-60.

32. von Elm E, Egger M. The scandal of poor epidemiological research. Bmj. 2004 Oct 16;329(7471):868-9.

33. Caro J, Salas M, Speckman J, Raggio G, Jackson J. Persistence with treatment for hypertension in actual practice. Can Med Assoc. 1999;160(1):31-7.

34. Guess HA. Exposure-time-varying hazard function ratios in case-control studies of drug effects. Pharmacoepidemiol Drug Saf. 2006 Feb;15(2):81-92.

35. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37-48.

36. Hennessy S, Bilker WB, Weber A, Strom BL. Descriptive analyses of the integrity of a US Medicaid claims database. Pharmacoepidemiol Drug Saf. 2003 Mar;12(2):103-11.

37. Ray WA. Observational studies of drugs and mortality. N Engl J Med. 2005 Dec 1;353(22):2319-21.

38. AHFS Drug Information. American Society of Health-System Pharmacists. 2006 [cited August 10, 2006]; Available from:

39. Schneider LS, Tariot PN, Dagerman KS, Davis SM, Hsiao JK, Ismail MS, et al. Effectiveness of Atypical Antipsychotic Drugs in Patients with Alzheimer's Disease. N Engl J Med. 2006;355(15):1525-38.

40. Wang PS, Schneeweiss S, Avorn J, Fischer MA, Mogun H, Solomon DH, et al. Risk of death in elderly users of conventional vs. atypical antipsychotic medications. N Engl J Med. 2005;353(22):2335-41.

41. Staiger D, Stock JH. Instrumental variables regressions with weak instruments. Econometrica. 1997;65(3):557-86.

Appendix A. Design of a Pharmacoepidemiology Database: Data Dictionary for Antidementia Drug Evaluation


With the recent discoveries that cyclooxygenase-2 inhibitor nonsteroidal anti-inflammatory drugs increase cardiac morbidity and that antipsychotic medications are associated with an increased risk of mortality in the elderly, it has become clear that more formal postmarketing surveillance methods are needed to detect serious but rare adverse drug events (ADEs) that were not uncovered during premarketing trials. The finding that antipsychotics are associated with an increased risk of mortality in the frail elderly also prompts the question of whether increased morbidity and mortality occur in elderly or frail individuals exposed to other classes of drugs.

Medicare pharmacy and claims databases are ideal for large postmarketing ADE surveillance studies. This document describes our methods for organizing Medicare pharmacy and claims data (using Medicaid data as an example) for ADE discovery and surveillance.

This project was a methods exercise concerned with ADE detection, using pharmaceuticals for dementia, including Alzheimer's disease, as the case study. Treatment for dementia has changed dramatically over the past decade with the introduction of pharmaceutical therapy. Two main classes of drug, approved by the US Food and Drug Administration for this purpose, are used to treat dementia. The first class comprises several acetylcholinesterase (AChE) inhibitors: donepezil hydrochloride, rivastigmine tartrate, galantamine hydrobromide, and tacrine. Tacrine currently is not used because of major concerns about toxicity (especially for the liver). The second class is N-methyl-D-aspartate (NMDA) receptor antagonists, which for Alzheimer's disease includes only memantine; this pharmaceutical is approved only for moderate to severe Alzheimer's disease.

Overview of Database Design

The database for this project was designed to support an historical cohort study. Cohort designs are needed because multiple outcomes often are evaluated when searching databases for unknown (or known but extremely rare or unlikely) ADEs. The three-step design process allows easy linkage across intermediate tables while preserving the longitudinal history of each subject and person-time information. The three steps involve

  1. extracting variables from raw claims tables,
  2. processing intermediate tables, and
  3. merging intermediate tables to produce analysis tables.

Appendix B supplies more detailed information about data management and data integrity analysis.

Drug Exposure Table

The drug exposure table is the foundational-i.e., analytic-table; all other intermediate tables link with it for analysis. The exposure table was designed to be flexible and can be modified to fit other exposure specifications.

For the antidementia drug study, each row represents a time span. Time spans vary during unexposed and exposed periods. During exposure to a medication of interest (i.e., antidementia therapy), the time span reduces to a single day. This design supports person-time calculations and survival analysis.

Antidementia drugs require dose titration. Because of this, multiple prescriptions may be simultaneously dispensed (i.e., dispensed on the same day). To maintain the one-record-per-day structure for exposed individuals, auxiliary exposure tables store data for patients who were simultaneously dispensed medications from the same drug class. This design will allow for the evaluation of initial dose, speed of titration, and overlap in early prescription fills.

Data Requirements

The raw data needed for ADE detection and surveillance studies include the following four main types: 1. Eligibility File; 2. Pharmacy Claims File; 3. Medical Claims; and 4. Death records. Additional databases and programs include:

  1. MULTUM (
    • Antidementia drug NDC codes
    • Unit dose, preparation
  2. First DataBank
    • Therapeutic categories
  3. H-CUP (Healthcare Cost and Utilization Project) Risk Adjuster, Chronic Condition indicator, Clinical classification tools for outcome conditions (

Finally, the intermediate tables produced for analysis include the following, and they are documented more fully below.

  1. Demographic and Enrollment tables
  2. Drug exposure tables
    • Cholinesterase inhibitor
    • Cholinesterase duplicate table
    • NMDA receptor agonist
    • NMDA duplicate table
  3. Cotherapy tables
  4. Medical tables
    • Professional diagnoses
    • Professional procedures (Current Procedural Terminology [CPT])
    • Facility diagnoses
    • Facility procedures (International classification of Diseases, version 9 [ICD-9])
  5. Death table.

Demographic and Enrollment Table

The Demographic/Enrollment Table (Table A-1 ) provides basic demographic information and enrollment status for beneficiaries. The time span (e1 through e2) is based on 1-month intervals. Definitions are provided following the table itself.

Table A-1. Demographic/enrollment table summary (monthly enrollment)
Field Name Type Length Label
1 Scrambled_id Char 20 Scrambled patient identifier
2 E1 Num 8 Start date for enrollment interval
3 E2 Num 8 End date for enrollment interval
4 Eligible Num 3 Eligible and enrolled in Medicaid, 1=Yes, 0=No
5 AidCategoryCode Char 4 Aid category
6 HMOstatus Num 3 Enrolled in HMO product, 1=Yes
7 Age1 Num 3 Single year age, 50+
8 County Num 3 County code
9 DeathDate Num 8 Date of death
10 Ethnicity Char 4 Ethnicity code
11 Gender Char 3 Gender, F = Female, M = Male
12 MaritalStatus Char 2 Marital status
13 Race Char 2 Race code
  1. Field Name: Scrambled_ID
    • Label: Scrambled patient identifier
    • Definition: A unique combination of symbols, characters, and numbers assigned to each individual that is used to identify medical and pharmacy claims for that individual.
    • Field Description: Char (20)
  2. Field Name: E1
    • Label: Start date for enrollment interval
    • Definition: Start date for monthly enrollment interval. Medicaid eligibility is by month and the enrollment table is designed to capture monthly enrollment status.
    • Field Description: Num (8)
  3. Field Name: E2
    • Label: End date for enrollment interval
    • Definition: End date for enrollment interval.
    • Field Description: Num (8)
  4. Field Name: Eligible
    • Label: Eligible and enrolled in Medicaid, 1=Yes, 0=No
    • Definition: Variable indicates whether the beneficiary was eligible and enrolled in Medicaid during the specific time span (month). This is important because eligibility status may change from month to month.
    • Field Description: Num (3)
  5. Field Name: AidCategoryCode
    • Label: Aid Category
    • Definition: Code that describes the reason for Medicaid eligibility.
    • Field Description: Char (4)
  6. Field Name: HMOstatus
    • Label: Enrolled in HMO product, 1=Yes
    • Definition: Variable indicates whether beneficiary enrolled in HMO product. This is important because Medical Claims data are unavailable for HMO-enrolled beneficiaries.
    • Field Description: Num (3)
  7. Field Name: Age1
    • Label: Single year age, 50+
    • Definition: Time-dependent age variable updated for each enrollment interval.
    • Field Description: Num (3)
    • Coding Requirements: record date-date of birth (DOB)
  8. Field Name: County
    • Label: County Code
    • Definition: County in which recipient resides.
    • Field Description: Num (3)
  9. Field Name: DeathDate
    • Label: Date of death
    • Definition: Date patient died.
    • Field Description: Num (8)
  10. Field Name: Ethnicity
    • Label: Ethnicity Code
    • Definition: H = Hispanic and N = not Hispanic.
    • Field Description: Char (4)
  11. Field Name: Gender
    • Label: Gender, F = Female, M = Male
    • Definition: Gender.
    • Field Description: Num (3)
  12. Field Name: MaritalStatus
    • Label: Marital Status
    • Definition:

      AW = Married-Both Spouses on AW

      CL = Common Law Marriage

      DV = Divorced

      IS = Institutionalized Spouse

      LS = Legally Separated

      LT = Living Together as Married

      MA = Married

      MO = Mar/Sep ORS Form 48

      NM = Never Married

      SL = Separated less than 1 year

      SM = Separated more than 1 year

      WI = Widowed
    • Field Description: Char (2)
  13. Field Name: Race
    • Label: Race Code
    • Definition:

      I = Native American (AI)

      B = Black (BL)

      0 = Asian (AS) or Asian/Pacific Islander (AP - old code)

      P = Pacific Islander (PI) * AP got split into AS and PI recently

      R = Refugee (Citizenship=RF)-All refugees are coded into R, regardless of race

      X = Other or Missing

      W = White (WH)
    • Field Description: Char (2)

Exposure Tables

The Exposure Tables consist of four tables; each drug class (Class 1, acetylcholinesterase inhibitors; Class 2, NMDA receptor antagonists) has a primary table and a duplicate table. Tables A-2 and A-3 illustrate this approach for cholinesterase inhibitors. In these tables, AChE are the class 1 drugs; NDC is the National Drug Classification; Rx is prescription; and mg is milligrams.

The primary tables are the foundational tables that characterize the beneficiaries' antidementia drug exposure history. If a member is simultaneously dispensed two medications from within the same drug class, then the medication with the largest number of days supplied is maintained in the drug-class-specific primary table. If the days supplied are the same, then the medication with the higher dose is included in the primary exposure table. The duplicate is flagged and the drug exposure information for all simultaneously dispensed intra-class drugs is stored in the class-specific duplicate drug table.

Table A-2. Class 1 primary drug exposure table summary (AChE)*
Variable Type Length Label
1 Scrambled_Id Char 20 Scrambled Patient Identifier
2 D1 Num 8 Interval Start Date
3 D2 Num 8 Interval End Date
4 RxNum_Class1 Num 3 AChEs* Rx Number
5 DaysSupply_Class1 Num 3 Number of days supplied (AChEs)
6 NumDisp_Class1 Num 3 Number of tablets dispensed (AChEs)
7 Dose_Class1 Num 3 AChEs Dose in mg
8 SubClass_Class1 Num 3 AChEs Sub Class
9 NDC_Class1 Char 12 AChE NDC code
10 PresciberID_Class1 Char 10 AChEs Prescriber ID
11 Possession_Class1 Num 8 Days AChEs likely possessed
12 Duplicate_Class1 Num 3 Flag for Simultaneously dispensed AChEs

* Once exposed, the span from d1 to d2 equals 1 day.

AChEs, acetylcholinesterase inhibitors.

Table A-3. Class 1 duplicate drug exposure table summary*
Variable Type Length Label
1 Scrambled_Id Char 20 Scrambled Patient Identifier
2 D1 Num 8 Interval Start Date
3 D2 Num 8 Interval End Date
4 RxNumDup_Class1 Num 3 AChEs Rx Number
5 DaysSupplyDup_Class1 Num 3 Number of days supplied (AChEs)
6 NumDispDup_Class1 Num 3 Number of tablets dispensed (AChEs)
7 DoseDup_Class1 Num 3 AChEs Dose in mg
8 SubClassDup_Class1 Num 3 AChEs Subclass
9 NDCDup_Class1 Char 12 AChE NDC code
10 PresciberIDdup_Class1 Char 10 AChEs Prescriber ID
11 Duplicate_Class1 Num 3 Flag for simultaneously dispensed AChEs

* Each row represents a dispensed prescription; there are no projections for whether patient likely possessed the medication.

AChEs, acetylcholinesterase inhibitors.

Details for Class 1 Drug Exposure Tables

To illustrate the overall approach, we present below the details for the cholinesterase inhibitors primary drug exposure table. Details for the duplicate table are exactly the same as the primary table except that medication possession is not projected to account for the days a patient likely possessed a specific AChE. Summary and details are identical for class 2 (NMDA receptor agonist) tables.

  1. Field Name: Scrambled_ID
    • Label: Scrambled Patient Identifier
    • Definition: A unique number assigned to each individual used to identify medical and pharmacy claims for that individual.
    • Field Description: Char (20)
  2. Field Name: D1
    • Label: Interval Start Date
    • Definition: D1 is the start of the interval and D2 is the interval end date.
      • For pre-exposure to antidementia drugs the D1-D2 span starts at first enrollment and ends on the first day of antidementia drug exposure.
      • Once exposed the span unit will be days.
      • Fill dates will be linked to D1 and not D2
      • Once exposure to all antidementia drugs ends, the D1-D2 span will increase from last day supplied to end of enrollment or end of study
    • Field Description: Num (8); Format (SAS system date)
    • Coding Requirements: First enrollment date initiates D1 interval.
  3. Field Name: D2
    • Label: Interval End Date
    • Definition: D2 is the end date for the interval. Once exposed to antidementia drugs, the span unit is days.
    • Field Description: Num (8); Format (SAS system date)
  4. Field Name: RxNum_Class1
    • Label: AChEs Rx Number
    • Definition: Variable that indicates the date that a cholinesterase inhibitor was dispensed and the fill/refill number.
    • Field Description: Num (3)
    • Coding Requirements: NDC codes for cholinesterase inhibitors provided below (acquired from MULTUM January, 2006). RxNumClass1 date is linked to the corresponding D1-D2 interval. RxFillNoClass1-the refill information is obtained from the Prescripfillno variable from the Medicaid table. See Annex 1.
  5. Field Name: DaysSupply_Class1
    • Label: Number of days supplied (AChEs)
    • Definition: Number of days supplied for each cholinesterase inhibitor dispensed.
    • Field Description: Num (3)
  6. Field Name: NumDisp_Class1
    • Label: Number of tablets dispensed (AChEs)
    • Definition: Quantity of AChEs units dispensed.
    • Field Description: Num (3)
  7. Field Name: Dose_Class1
    • Label: AChEs Dose in mg
    • Definition: Dose of AChE in mg.
    • Field Description: Num (3)
    • Coding Requirements: The variable "strength_num_amount" was obtained from ndc_denorm file from the Multum lexicon drug_mlt.mdb access database and linked to the Primary Exposure table by NDC code.
  8. Field Name: SubClass_Class1
    • Label: AChEsSubclass
    • Definition: Indicator of specific medication in AChE class of antidementia drugs. Donepezil = 11, Tacrine = 12, Rivastigmine = 13, Galantamine = 14
    • Field Description: Num (3)
    • Coding Requirements: Table A-4 below provides the NDC codes by AChE (11, 12, 13, 14 in Table A-4 )
  9. Field Name: NDC_Class1
    • Label: AChE NDC code
    • Definition: NDC code for drugs in AChE class.
    • Field Description: Num (12)
  10. Field Name: PrescriberID_Class1
    • Label: AChEs Prescriber ID
    • Definition: Prescriber identification number.
    • Field Description: Numeric (10)
  11. Field Name: Possession_Class1
    • Label: Days AChEs likely possessed
    • Definition: Indicator for days AChE likely possessed prescriber identification numbers.
    • Field Description: Num (8); 1 = 1 medication in possession, 2 = 2 medications in possession, etc.
    • Coding Requirements: A 1 is coded from the dispense date to end of supply date (dispense date + number of days supplied). When overlaps occur, the number of medications likely in possession are recorded (e.g., 2 for the number of days two medications overlap)
  12. Field Name: Duplicate_Class1
    • Label: Flag for simultaneously dispensed AChEs
    • Definition: Flag to indicate when medications are simultaneously dispensed from the same drug class.
    • Field Description: Num (3); 1 = in possession
    • Coding Requirements: If drugs are dispensed from the same class on the same day, then the medication with the most days supplied is included in the Primary Drug Exposure table and information on both medications is stored in the Duplicate Drug table. If both medications have the same number of days supply, then the medication with the higher dose is stored in the Primary Drug Exposure table.

Cotherapy Table

The Cotherapy Table is used to account for concomitant medications and drug interactions. First Databank's drug classification system is used to organize NDCs by therapeutic category (Table A-4 ).

Table A-4. Cotherapy Table Summary
Field Name Type Length Label
1 Scrambled_id Char 20 Scrambled Patient Identifier
2 Provider_ID Char 12 Provider Identifier
3 PrescribedDate Char 12 Prescription Date
4 DispenseDate Char 12 Dispense Date
5 RefillInd Char 3 Refill Indicator
6 DrugQuantity Num 3 Quantity Dispensed
7 DaysSupplied Num 3 Number of Days Supplied
8 DrugCode Char 12 NDC Code
9 DrugTherapeuticClass Char 3 Therapeutic Class

Details for Cotherapy Table

  1. Field Name: Scrambled_ID
    • Label: Scrambled Patient Identifier
    • Definition: A unique combination of symbols, characters, and numbers assigned to each individual that is used to identify medical and pharmacy claims for that individual.
    • Field Description: Char (20)
  2. Field Name: Provider_ID
    • Label: Provider Identifier
    • Definition: Unique identification number used to identify prescribers.
    • Field Description: Char (12)
  3. Field Name: PrescribedDate
    • Label: Prescription Date
    • Definition: Date drug was prescribed.
    • Field Description: Char (12)
  4. Field Name: DispensedDate
    • Label: Dispense Date
    • Definition: Date prescription was dispensed.
    • Field Description: Char (12)
  5. Field Name: ReFillInd
    • Label: Refill Indicator
    • Definition: Variable that indicates whether the prescription is a refill.
    • Field Description: Char (3);
  6. Field Name: DrugQuantity
    • Label: Quantity Dispensed
    • Definition: Number of units dispensed.
    • Field Description: Num (3)
  7. Field Name: DaysSupplied
    • Label: Number of Days Supplied
    • Definition: Number of days the medication was prescribed.
    • Field Description: Num (3)
  8. Field Name: DrugCode
    • Label: NDC Code
    • Definition: National Drug Classification code.
    • Field Description: Character (12)
  9. Field Name: DrugTherapeuticClass
    • Label: Therapeutic Class
    • Definition: First Databank's classification code that indicates the specific therapeutic class in which the NDC belongs.
    • Field Description: Char (3)
    • Coding Requirements: Obtained First Databank

Medical Claims Table

Medical claims are separated into four tables. The four tables are (1) Clinic-based procedure table (CPT, Table A-5 ); (2) Facility-based procedure table (ICD-9 procedure codes, Table A-6 ); (3) Clinic-based diagnosis table (ICD-9); and (4) Facility-based diagnosis table (ICD-9, both Table A-7 ). Indicator variables identify provider type and location. We used the Healthcare Cost and Utilization Project (HCUP) clinical classification system (CCS) to classify procedure and diagnosis codes.

To reduce redundancy, the details are listed for the four medical tables together. The numbers do not correspond to the table summary numbers.

Table A-5. Clinic Procedure Table Summary
Variable Type Length labels
1 Scrambled_id Char 20 Scrambled Patient Identifier
2 ServiceBeginDate Char 12 Service Begin Date
3 ServiceEndDate Char 12 Service End Date
4 TCN Char 17 Transaction Number
5 ProviderCategory Num 3 Provider Category
6 Cpt4 Char 6 CPT 4
7 RevenueCode Char 5 Revenue Code
8 Ccs_cpt Num 3 Clinical Classification System (CPT)
Table A-6. Facility Procedure Table Summary
Variable Type Length Label
1 Scrambled_id Char 20 Scrambled Patient Identifier
2 TCN Char 17 Transaction Number
3 ProcCode Char 5 Procedure Code (ICD-9)
4 SurgeryDate Char 12 Surgery Date
5 DXRcdCode Char 3
6 SPRCCS1 Char 4 Single-Level Procedure CCS 1
7 L1PCCS1 Char 5 Level 1 Multilevel Procedure CCS 1
8 L2PCCS1 Char 5 Level 2 Multilevel Procedure CCS 1
9 L3PCCS1 Char 7 Level 3 Multilevel Procedure CCS 1

Tables for professional and facility diagnoses are identical. Variables 60-88 in the clinic diagnosis and facility diagnosis tables are disease categories from the HCUP comorbidity software. .

Table A-7. Diagnosis Table Summary
Variable Type Length Label
1 Scrambled_id Char 20 Scrambled Patient Identifier
2 TCN Char 17 Transaction Number
3 ServiceBeginDate Char 12 Service Begin Date
4 ServiceEndDate Char 12 Service End Date
5 Dx1 Char 6 Primary Diagnosis
6 Dx2 -dx5 Char 6 Diagnosis # 2-#5
7 Ndx Num 6 Number of Diagnoses
8 RevenueCode Char 5 Revenue Code
9 ProviderCategory Char 2 Provider Category of Service
10 DiagnosisCode Char 5 Diagnosis Code
11 AdmissionType Num 3 Type of Admission
12 AdmissionSource Num 3 Source of Admission
13 DischargePatientStatus Char 1 Patient's Discharge Status
14 DRG Num 4 Diagnostic Related Group
15 SDXCCS1 - SDXCCS5 Char 4 Single-Level Diagnosis CCS 1-5
16 L1DCCS1 - L1DCCS5 Char 5 Level 1 Multilevel Diagnosis CCS 1-5
17 L2DCCS1- L2DCCS5 Char 5 Level 2 Multilevel Diagnosis CCS 1-5
18 L3DCCS1 - L3DCCS5 Char 7 Level 3 Multilevel Diagnosis CCS 1-5
19 L4DCCS1 - L4DCCS5 Char 9 Level 4 Multilevel Diagnosis CCS 1-5
20 Chronic_indicator1 Num 3 Chronic Disease Indicator (primary diagnosis)
21 Chronic_indicator2 - 5 Num 3 Chronic Disease Indicator (diagnosis #2 - #5)
22 CHF Num 3 Congestive heart failure
23 VALVE Num 3 Valvular disease
24 PULMCIRC Num 3 Pulmonary circulation disease
25 PERIVASC Num 3 Peripheral vascular disease
26 PARA Num 3 Paralysis
27 NEURO Num 3 Other neurological disorders
28 CHRNLUNG Num 3 Chronic pulmonary disease
29 DM Num 3 Diabetes w/o chronic complications
30 DMCX Num 3 Diabetes w/ chronic complications
31 HYPOTHY Num 3 Hypothyroidism
32 RENLFAIL Num 3 Renal failure
33 LIVER Num 3 Liver disease
34 ULCER Num 3 Peptic ulcer disease x bleeding
35 AIDS Num 3 AIDS
36 LYMPH Num 3 Lymphoma
37 METS Num 3 Metastatic cancer
38 TUMOR Num 3 Solid tumor w/out metastasis
39 ARTH Num 3 Rheumatoid arthritis/collagen vas
40 COAG Num 3 Coagulopathy
41 OBESE Num 3 Obesity
42 WGHTLOSS Num 3 Weight loss
43 LYTES Num 3 Fluid and electrolyte disorders
44 BLDLOSS Num 3 Chronic blood loss anemia
45 ANEMDEF Num 3 Deficiency anemias
46 ALCOHOL Num 3 Alcohol abuse
47 DRUG Num 3 Drug abuse
48 PSYCH Num 3 Psychoses
49 DEPRESS Num 3 Depression
50 HTN_C Num 3 Hypertension

Details for Procedure and Diagnosis Tables

Items 1-8 correspond to Table A-5 . We do not provide details for Table A-6 .

  1. Field Name: scrambled_ID
    • Label: Scrambled Patient Identifier
    • Definition: A unique combination of symbols, characters, and numbers assigned to each individual that is used to identify medical and pharmacy claims for that individual.
    • Field Description: Char (20)
  2. Field Name: serviceBeginDate
    • Label: Service Begin Date
    • Definition: Service start date.
    • Field Description: Char (12)
  3. Field Name: serviceEndDate
    • Label: Service End Date
    • Definition: End of service date.
    • Field Description: Char (12)
  4. Field Name: TCN
    • Label: Transaction Number
    • Definition: Unique number to indicate a specific transaction. This number is used to link diagnoses to process claims.
    • Field Description: Char (17)
  5. Field Name:ProviderCategoryOfService
    • Label: Service Location
    • Definition: Indicator of service location; 01 = Inpatient Hospital General, 07 = Outpatient Hospital, 24 = Ambulatory Surgical. See Annex 2 for more details.
    • Field Description: Char (3)
  6. Field Name: Cpt4
    • Label: CPT4 Code
    • Definition: Billing code used by health providers to bill for services.
    • Field Description: Char (6)
  7. Field Name: revenueCode
    • Label: Revenue Code
    • Definition: Used to identify emergency department visits; 450 and 0450 = emergency department visits.
    • Field Description: Char (5)
  8. Field Name: ccs_cpt

Details below correspond to Table A-7 for elements not defined above.

  1. Field Name: dx1
    • Label: Primary Diagnosis
    • Definition: Diagnosis code used to indicate cause of medical visit or hospitalization; services were rendered.
    • Field Description: Char (6)
  2. Field Name: dx2-dx5
    • Label: Diagnosis #2-Diagnosis #5
    • Definition: Diagnosis codes by sequence.
    • Field Description: Char (6)
    • Note: Some databases contain nine or more diagnosis codes.
  3. Field Name: ndx
    • Label: Number of Diagnoses
    • Definition: Count of the number of ICD-9 diagnosis codes for a particular transaction.
    • Field Description: Char (6)
  4. Field Name: DRG
    • Label: Diagnostic Related Group
    • Definition: A system to classify hospital cases into one of approximately 500 groups, also referred to as DRGs, expected to have similar hospital resource use, developed for Medicare as part of the prospective payment system.
    • Field Description: Num (4)
  5. Field Name: SDXCCS1-SDXCCS5
  6. Field Name: L1DCCS1-L1DCCS5
    • Label: Level 1 Multilevel Diagnosis CCS 1-Level 1 Multilevel Diagnosis CCS 5
    • Definition: HCUP level 1 from multilevel diagnosis classification system for dx1 to dx5.
    • Field Description: Num (5)
    • Coding Requirements:
  7. Field Name: L2DCCS1-L2DCCS5
    • Label: Level 2 Multilevel Diagnosis CCS 1-Level 2 Multilevel Diagnosis CCS 5
    • Definition: HCUP level 2 from multi-level diagnosis classification system for dx1 to dx5.
    • Field Description: Num (5)
    • Coding Requirements:
  8. Field Name: L3DCCS1-L3DCCS5
    • Label: Level 3 Multilevel Diagnosis CCS 1-Level 3 Multilevel Diagnosis CCS 5
    • Definition: HCUP level 3 from multi-level diagnosis classification system for dx1 to dx5.
    • Field Description: Num (7)
    • Coding Requirements:
  9. Field Name: L4DCCS1-L4DCCS5
    • Label: Level 4 Multilevel Diagnosis CCS 1-Level 4 Multilevel Diagnosis CCS 5
    • Definition: HCUP level 4 from multi-level diagnosis classification system for dx1 to dx5.
    • Field Description: Num (9)
    • Coding Requirements:
  10. Field Name: chronic_indicator1-chronic_indicator5

Death Table

Table A-8. Death Table
Variable Type Length Label
1 Scrambled_id Char 20 Scrambled Patient Identifier
2 StateOfDeath Char 10 State of Death
3 AutopsyDone Char 5 Autopsy Done
4 AutopsyUsed Char 5 Autopsy Used
5 CountyOfDeath Char 10 County of Dealth
6 TimeOfDeath Char 10 Time of Death
7 Cause1 Char 8 Underlying cause of death
8 Cause2 - 9 Char 8 Cause of death 2-9
9 MannerOfDeath Char 3 Manner of Death
10 Cause Char 8 Attributed cause
11 InjuryAtWork Char 3 Injury at Work
12 InjuryCounty Char 8 Inury County
13 InjuryDate Char 10 Injury Date
14 InjuryMotorVehicle Char 3 Who got injuryed in Motor Vehicle
15 InjuryPlaceCd Char 5 Injury Place Code
16 InjuryState Char 5 Injury State
17 InjuryTime Char 10 Injury Time
18 AgeInYears Num 5 Age at death by single year
19 ArmedForces Char 3 Death in ArmedForces
20 BirthCountryCd Char 3 Birth Country Code
21 IndustryCd Char 3 Industry Code
22 DateOfDeath Char 10 Date of Death

Details for Death Table

  1. Field Name: Scrambled_ID
    • Label: Scrambled Patient Identifier
    • Definition: A unique combination of symbols, characters, and numbers assigned to each individual that is used to identify medical and pharmacy claims for that individual. The unique number is often a scrambled social security number or enrollee policy number.
    • Field Description: Char (20)
  2. Field Name: StateOfDeath
    • Definition: State where death occurred.
    • Field Description: Char (10)
  3. Field Name: AutopsyDone
    • Definition: Indicator variable for whether an autopsy was done.
    • Field Description: Char (5)
  4. 4 Field Name: AutopsyUsed
    • Definition: Indicator for whether autopsy was used.
    • Field Description: Char (5)
  5. Field Name: CountyOfDeath
    • Definition: County where death occurred.
    • Field Description: Char (10)
  6. Field Name: TimeOfDeath
    • Definition: Time of day death occurred.
    • Field Description: Char (10)
  7. Field Name: Cause1
    • Definition: Underlying cause of death.
    • Field Description: Char (8)
  8. 8 Field Name: Cause2 - 9
    • Definition: Additional causes of death 2-9.
    • Field Description: Char (8)
  9. Field Name: MannerOfDeath
    • Definition: Manner of Death, how the person died.
    • Field Description: Char (3)
  10. Field Name: Cause
    • Definition: Attributed cause of death.
    • Field Description: Char (8)
  11. Field Name: InjuryAtWork
    • Definition: Indicator for whether injury was caused at or while working.
    • Field Description: Char (3)
  12. Field Name: InjuryCounty
    • Definition: County where the injury occurred.
    • Field Description: Char (8)
  13. Field Name: InjuryDate
    • Definition: Date injury occurred that was attributed to the patient's death.
    • Field Description: Char (10)
  14. Field Name: InjuryMotorVehicle
    • Definition: Indicator for whether the injury was from a motor vehicle.
    • Field Description: Char (3)
  15. Field Name: InjuryPlaceCd
    • Definition: Code for place of injury
    • Field Description: Char (5)
  16. Field Name: InjuryState
    • Definition: State injury occurred.
    • Field Description: Char (5)
  17. Field Name: InjuryTime
    • Definition: Time of injury that lead to death.
    • Field Description: Char (10)
  18. Field Name: AgeInYears
    • Definition: Age of death by single year.
    • Field Description: Char (5)
  19. Field Name: ArmedForces
    • Definition: Indicator for whether person died while in the armed forces.
    • Field Description: Char (3)
  20. Field Name: BirthCountryCd
    • Definition: Birth country code
    • Field Description: Char (3)
  21. Field Name: IndustryCd
    • Definition: Industry code.
    • Field Description: Char (3)
  22. Field Name: DateOfDeath
    • Definition: Date person died.
    • Field Description: Char (10)
Annex 1. National Drug Classification Codes for Antidementia Medications
Class of Medication Medication Codes
Cholinesterase Inhibitors(1) Donepezil Hydrochloride (11)
  • 54868395200
  • 54868424500
  • 62856083130
  • 62856024590
  • 62856024530
  • 62856024541
  • 62856083230
  • 62856024690
  • 62856024630
  • 62856024641
Tacrine Hydrochloride (12)
  • 00071009840
  • 00071009525
  • 00071009540
  • 00071009825
  • 00071009740
  • 00071009640
  • 00071009725
  • 00071009625
  • 59630019212
  • 59630019312
  • 59630019012
  • 59630019112
Rivastigmine Tatrate (13)
  • 00078032306
  • 00078032406
  • 00078032344
  • 00078033931
  • 00078032644
  • 00078032444
  • 00078032544
  • 00078032606
  • 00078032506
Galantamine Hydrobromide (14)
  • 50458038730
  • 50458038930
  • 50458038830
  • 50458039060
  • 50458039260
  • 50458039160
  • 50458039910
  • 50458039860
  • 50458049010
  • 50458039760
  • 50458039660
Glutamate Pathway Modifiers(2) Memantine Hydrochloride (21)
  • 00456320212
  • 00456320014
  • 00456320563
  • 00456320560
  • 00456321063
  • 00456321060

Annex 2. Value Labels for Category of Service





08 ICF/MR1 (LOC 4)

09 ICF/MR2 (LOC 5)

10 ICF/MR3 (LOC 6)

12 USDC IMR-1 (LOC 4)

13 USDC IMR-2 (LOC 5)

14 USDC IMR-3 (LOC 6)

15 ICF-1 (LOC 7) NF-II

16 ICF-2 (LOC 2) NF-III


18 SNF-1 (LOC 8) ISC

19 SNF-2 (LOC 3) NF-I









































































Appendix B. Documentation of Data Management


This appendix provides detailed information on data management steps for extracting, processing, linking, and merging data for this study. We first provide an overview of the Utah Medicaid Data Warehouse, from which data for this project were taken. Subsequent sections describe the database and study design and three critical steps for the work: extracting raw data, producing intermediate files, and linking files. The final section describes data quality assurance.

Overview of Utah Medicaid Data Warehouse

The work was done through the Utah Office of Health Care Statistics (OHCS), a contracted business associate of the Division of Health Care Finance (Utah Medicaid) in the Utah Department of Health (DOH). The Office of Vital Records and Statistics in the DOH has an ongoing data-sharing agreement with the Utah Medicaid. Vital records data are stored in the Data Warehouse. Authorized users access the vital records through the Data Warehouse. Staff of OHCS downloaded the study data from the Utah Medicaid Data Warehouse for this project.

The data models in the Medicaid Data Warehouse are designed to pay claims, which are not designed or readily available for pharmacoepidemiology analysis. Raw data tables were organized at the various levels ranging from an individual or claim to a diagnosis/procedure code. We have reconstructed the payment-based data files into analytical data sets.

Overview of Pharmacoepidemiology Database Design

The pharmacoepidemiology database is designed for historical cohort studies. Cohort designs are needed because multiple outcomes are often evaluated when mining for unknown adverse drug effects.

We designed a three-step design process to allow easy linkage across intermediate tables while preserving the longitudinal history of each subject and person-time information. It includes (1) variable extraction from raw claims tables; (2) processing intermediate tables, and (3) merging intermediate tables to produce analysis tables. These steps are described below.

Extracting Study Data from Raw Data Tables in the Medicaid Data Warehouse

The study population included all of Utah Medicaid enrollees who were 50 years or older and eligible for Medicaid from January 1, 2003 to December 31, 2005. We extracted four types of raw data from the Utah Medicaid Data Warehouse. We used ( to process raw data files and convert the text data files into SAS data sets. The data files were downloaded by calendar year. The 2004 data files illustrate the types of files used:

  • Medicaid Eligibility Files
    • Eligibility.50plus2004.qrd
    • MCOPremiumPayment(Enrollment).2004.qrd
    • MaritalStatus-2004-50Plus-PseudoID.qrd
    • EthnicRace-ClientID-2004-50Plus.qrd
  • Pharmacy Claims Files
    • 2004-Jan-Jun.StatePharmacyInitiativeDrugQuantity.AllAges.qrd
    • 2004-July-Dec.StatePharmacyInitiativeDrugQuantity.AllAges.qrd
    • DrugQuantity-2004-50Plus.qrd
  • Medical Claims
    • CPTandRevenue-2004-50Plus.qrd
    • DiagnosisCode (exceptForPharmacy)-2004.qrd
    • SurgeryProcedreCode-2004-50Plus.qrd
    • DRG-AdmS-Disch-status-2004-50Plus-fClm.qrd
  • Death records
    • DeathAllClinical.txt
    • Medicaid-Linkage-file-2004-50Plus.qrd
    • DeathLinkageInfo-2004-50Plus.qrd.

Producing Intermediate Tables

The project created five types of intermediate tables for the analyses. SAS program, log files, and output files are listed under each type of intermediate tables listed below.

  1. Demographic/enrollment tables
    • - Produce eligibility table
  2. Drug exposure table
    • - Produce antidementia table
    • - Produce antidementia table by drug class
    • - Produce antidementia exposure tables
    • - Add antidementia flag to diagnosis table
  3. Co-therapy table
    • - Produce pharmacy table
  4. Medical claims tables
    • - Produce diagnosis and procedure tables
    • - Add DRG and discharge status to diagnosis tables
    • - Produce diagnosis and procedure tables for facility and professional
  5. Death record table
    • - Produce linked death dataset

Linking Records

We applied three types of linkage for this study and the pharmacoepidemiologic database. A total of 14 raw data sets were downloaded from the Utah Medicaid Data Warehouse. Each data file includes at least one data element as the linkage key.

Client identifications and linking client records among the Medicaid files . Four Medicaid-assigned client unique identifications are stored in the Data Warehouse. The first one is called "ClientID," which is created for each client at the enrollment time and stored in the eligibility tables. The second identifier is the element in the ClaimHeaderV table labeled as "RecipientID." This ClaimHeader table also includes the third and most frequently used unique linkage element. "RecipientPseudoID" is an edited and deduplicated key. This deduplicated identification is labeled "PseudoID" in Medicaid Managed Care System (MMCS) Capitation tables. All raw data records in this study include at least one of the four client identifications as the client-level linking key.

These Medicaid-assigned client IDs are either identical or related to each other through patient names and Social Security Numbers within the Data Warehouse. During the process of data management, we converted the different IDs to RecipientPseudoID and then to Scrambled_ID (an encrypted, deidentified number) for the research data files.

Claim identification and linking claims to clients among the Medicaid files. Except for eligibility records, all other records contain claim-based information. However, information for one claim is saved in several separate tables by the nature of the information. For example, patient demographics, service type, and dates are in the ClaimHeaderV table. Diagnostic elements are in tables of ClaimDiagnosis, DiagnosisICD9, DiagnosisICDGroup. Procedure information is stored in tables of SurgeryData (UB92), SurgicalProcedure, and SurgicalProcedureGroup. Each ICD code, code sequence index, and other information consist of its own record line.

Linkage element for claims is called the Transaction Control Number, or TCN. We use TCN, service date, and final claim status to assemble claims together. Every claim-related record contains at least two linkage data elements: TCN and Scrambled_ID.

Linking death records to Medicaid deceased clients . Linkage keys (data elements). Utah Medicaid Data Warehouse contains the certificate records of birth, death, and fetal death. However, the Vital Records data tables are not linked to any of the Medicaid data tables in the Data Warehouse.

Table B.1 shows the data elements in the Medicaid eligibility data table that are possibly usable for a linking purpose. Table B.2 shows possible linkage elements from the Death Certificate database.

Table B.1. Medicaid eligibility data elements for linking records
Variable Name Functions
PseudoID Linkage number for Medicaid DWSSN may be used for some clients
ClientID Linkage number for Medicaid DWSSN may be used for some clients
DeathDate Use for verification
DOB Use for linking
Gender Use for verification
RaceCd Use for verification
Ethnicity Use for verification
RecipientSSN Use for linking
RecipientLastName Use for linking
RecipientFirstName Use for linking
RecipientMiddleInitial Use for linking
RecipAreaCd Reference information

Table B.2. Death certificate data elements for linking records
Variable Name Functions
StateFileNumber Linkage number for Death Certificates
DateOfDeath Use for verification
DecedentLastName Use for linking
DecedentFirstName Use for linking
DecedentMiddleName Use for linking
StateOfDeath Data Quality Flag
Alias1FirstName Use for linking
Alias1Initial Use for linking
Alias1LastName Use for linking
Alias2FirstName Use for linking
Alias2Initial Use for linking
Alias2LastName Use for linking
AliasFlag Linkage Flag
DateOfBirth Use for linking
GenderCd Use for verification
HispanicOrigin Use for verification
HispanicType Use for verification
RaceAsian Use for verification
RaceCd Use for verification
RaceCd1 Use for verification
RaceCd2 Use for verification
RaceCd3 Use for verification
RaceCd4 Use for verification
RaceCd5 Use for verification
RaceIslander Use for verification
RaceOther Use for verification
RaceTribe Use for verification
SSN Use for linking

Deterministic linking method. We used the deterministic linking method and available patient identifiers in both systems to link death records to Medicaid deceased clients. We used the following SAS programs:

  • Reading raw data file:
    • (Linkage elements for Medicaid files)
    • (Linkage elements for death records)
    • (Other information for death records)
  • Linking records:
    • (Linking records by Social Security Number only)
    • (Linking records by Date of Birth and Names only)
    • (Combined linked records).

Linking results. Results of linking death certificates to the Medicaid eligibility file are documented in Tables B.3 and B.4 . A total of 3,022 death records were linked between the Medicaid eligibility records from January 1, 2003 to December 31, 2005, and the death certificate records from January 1, 2003 to April 30, 2006. One hundred and seven Medicaid enrollees who were enrolled between 2003 and 2005, died between January 1 and April 30, 2006 (see Table B.3 ).

Approximately 76 percent of linked records were linked by all three elements: date of birth, Social Security Number, and names. The Social Security Number had a better performance (20%) than did either name or date of birth (4%) (Table B.4 ).

Value added. The Utah Medicaid Program independently tracks deceased clients in the data field "DeathDate." If a client died, the known date of death will be added to the Data Warehouse. However, the Data Warehouse does not have an ongoing linking process to obtain timely notification of deaths of Medicaid clients from the Vital Record system.

Table B.5 describes the discrepancy between the two systems. Among all linked 3,022 records, 782 deaths (26%) were not recorded in the Medicaid Data Warehouse. The Utah Medicaid Director was notified and further investigation is planned.

Limitations. Using deterministic methods to link death and Medicaid eligibility records may miss some cases of deceased antidementia drug users. Probabilistic linking method may increase the number of linked cases that could enhance detect power of the study. Even within the deterministic method, the current linking algorithm in the SAS programs can be improved. For example, linking with alias names is an option not pursued in this study.

Merging linked records with the full death records. After identifying the Medicaid death certificates, we used the Vital Records State File Number to extract the selected records and elements from the death certificate file and linked those records to the pharmacoepidemiology database.

Table B.3. Results of linking death records to Medicaid eligibility records
Year of Death Number of Linked Cases Percentage of Linked Cases
2003 1,036 34.3
2004 861 28.5%
2005 1,018 33.7%
2006 Jan. 1–April 30 107 3.5%
Total 3,022 100.0%

Table B.4. Evaluation of linkage elements
Linked by Number of Linked Cases Percentage of Linked Cases
Name and Date of Birth (DOB) only 125 4.1
Social Security Number (SSN) only 610 20.2
SSN, DOB, and Names 2,287 75.7
Total 3,022 100.0

Table B.5. Additional information on Medicaid data quality
Date of death Number of Linked Cases Percentage of Linked Cases
Recorded in Medicaid Data Warehouse 2,240 74.1
Not recorded in Medicaid Data Warehouse 782 25.9
Total 3,022 100.0

Data Integrity Analysis

Presented below are seven figures displaying graphs for claims, enrollments, and deaths (Figures B.1 - B.7). As part of our data quality assurance steps, we visually inspected these for problems.

Figure B.1. Frequency of facility diagnosis claims by month

Graph with dates on the x axis ranging from January 2003 to January 2006 and number of claims on the y axis. The y axis goes from 3,000 to 6,000. Values fluctuate between around 3,500 and 4,000 between January 2003 and January 2005. Claims peak in June 2005 at close to 6,000. They fall to about 4,000 in September 2005 and to under 3,500 in December 2005. Data not shown for January 2006.

Figure B.2. Frequency of facility procedures by month

Graph with dates on the x axis ranging from January 2003 to January 2006 and number of procedures on the y axis. The y axis goes from 50,000 to 90,000. The number of procedures starts at around 63,000. Between January 2003 and January 2004, the number fluctuates between 60,000 and 70,000, dropping to about 53,000 in February 2004. Values remain between 50,000 and 60,000 until January 2005, when they jump to 80,000. Values fluctuate between 75,000 and 80,000, rising to nearly 85,000 in August 2005 before dropping to 75,000. The number then rises slightly before dropping to about 63,000 in December 2005. Data not shown for January 2006.

Figure B.3. Frequency of professional diagnoses by month

Graph with dates on the x axis ranging from January 2003 to January 2006 and number of professional diagnoses on the y axis. The y axis goes from 26,000 to 35,000. The number fluctuates between 31,000 and about 35,000 from January 2003 to May 2005, when it drops to just below 27,000. It then rises to 34,000 in August 2005, drops to about 30,000, rises slightly, and then falls to 26,000 in December 2005. Data not shown for January 2006.

Figure B.4. Frequency of professional procedures (current procedural terminology) by month

Graph with dates on the x axis ranging from January 2003 to January 2006 and number of professional procedures on the y axis. The y axis goes from 20,000 to 80,000. The number rises steadily from about 30,000 in January 2003 to about 40,000 in January 2005, with one slight decrease between September 2003 and January 2004. The number jumps to about 75,000 in January 2005, fluctuates a bit during the year, falls to about 68,000 in September 2005, rises slightly, and then falls to about 58,000 in December 2005. Data not shown for January 2006.

Figure B.5. Frequency of pharmacy claims by month

Graph with dates on the x axis ranging from January 2003 to January 2006 and number of pharmacy claims on the y axis. The y axis goes from 100,000 to 320,000. Between January 2003 and January 2005, the number of claims fluctuates between 260,000 and 300,000, with one large drop in June 2003, to 160,000. In January 2005, the number of claims drops from 300,000 to 120,000 and fluctuates between 120,000 and 130,000 until December 2005. Data not shown for January 2006.

Figure B.6. Frequency of Medicaid enrollees by month

Graph with dates on the x axis ranging from January 2003 to January 2006 and number of Medicaid enrollees on the y axis. The y axis goes from 21,000 to 30,000. The number starts around 26,000 in January 2003, falls steadily to about 21,500 in December 2003, then jumps to about 28,500 in January 2004. The number falls steadily to just under 23,000 in December 2004, then jumps to about 30,000 in January 2005. It then falls steadily, to just under 24,000 in December 2005. Data not shown for January 2006.

Figure B.7. Frequency of death by month

Graph with dates on the x axis ranging from January 2003 to May 2006 and number of deaths on the y axis. The y axis goes from 0 to 180. The number of deaths starts at around 30 in January 2003 and rises steadily, with a few fluctuations, peaking near 180 in December 2003. It drops to about 30 in January 2004 and rises steadily, with a few fluctuations, peaking around 120 in December 2004. The number drops just below in January 2005 and rises steadily, with a few fluctuations, peaking around 120 in December 2005. The number drops to about 105 in January 2006 and then falls close to 0 in February 2006. Data not shown beyond February 2006.

We did not identify any apparent missing blocks of claims. For example, the frequency of claims did not drop to zero or far below the average frequency in any month. Overall, the volume of facility and procedure claims appeared to decrease in 2004.

Pharmacy claims apparently had a steep dropoff in 2005, but inspection of the data revealed that our original data runs had not limited Medicaid patients to age 50 and over for years 2003 and 2004. This problem also affected the linkage analysis for pharmacy claims. We rectified this error for all our final analyses.