A Primer on Comparative Effectiveness Research Methods

The American Journal of Pharmacy Benefits, October 2010, Volume 2, Issue 5

At the population level, comparative effectiveness research can inform policies and incentives that result in improvements in health and minimize risk.

Comparative effectiveness research (CER) has been defined by the Federal Coordinating Council as studies that involve pragmatic comparative trials and/or synthesis of existing research that is applicable to real-world settings.1 The council further states, “The purpose of this research is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances."

The American Recovery and Reinvestment Act of 2009 directed the Institute of Medicine to develop a broadbased list of priority areas for CER.2 The Institute of Medicine called for building a “broad and supportive infrastructure to carry out a sustainable national CER strategy.” 2 The institute rated “health care delivery system” as the most important topic area, as it was included in 50 of 100 priority research areas as ranked by primary and secondary researchers. Clearly, one of the most critical steps for improving the country’s health is to assist healthcare systems in implementing CER evidence.2

There has been much discussion in the literature over the past 24 months concerning CER and about as much speculation concerning the impact of CER. Comparative effectiveness research has 6 extant themes, including the “generation and synthesis of evidence” that is comparative.3 Ultimately, CER should help to inform decisions that improve health. Another defining component of CER is that it distinguishes between efficacy and effectiveness.4

In the pharmacy benefits world, it has long been recognized that randomized clinical trials involving placebo comparators, surrogate measures of disease improvement, strict inclusion/exclusion criteria, high medication adherence, and close monitoring of patients provide estimates of the efficacy of medications. The effectiveness of medications is measured under real-world conditions, where medications are prescribed for a variety of patients rather than a specific studied population, often are used after a series of treatment failures or in combination with other agents that modify the disease process, and are taken less often than prescribed; also, patients are seen rather infrequently. Comparative effectiveness research is intended to reflect real-world use, and the corresponding benefits and risks.

It is important to note that CER involves multiple types of research activities. One could categorize CER into 3 different study types: (1) large simple randomized controlled trials; (2) analysis of administrative and other existing data; and (3) synthesis of multiple trials using meta-analysis and/or Bayesian techniques. In this commentary, I provide a brief description of each of these study designs, as well as a discussion of their advantages and limitations.

Large simple randomized trials such as ALLHAT (Antihypertensive and Lipid Lowering Treatment to Prevent Heart Attack Trial), ACCORD (Action to Control Cardiovascular Risk in Diabetes), and STAR (Study of Tamoxifen and Raloxifene) are extremely costly. The large number of patients (ALLHAT N = 42,418; ACCORD N = 10,251; STAR N = 19,747) required to make statements about treatment effectiveness drives the cost of these studies well into the millions of dollars.5 The advantage of these studies is that they evaluate final health outcomes such as heart attacks and mortality. In addition, they often involve multiple treatment regimens and subpopulations of interest. However, due to the time required to see final health outcomes and the large numbers of patients, these studies are very expensive to conduct. In addition, practice patterns often change over the course of the study,and the treatments may not be relevant once the trial is completed.6

Another emerging type of CER study is evaluations of existing data such as healthcare claims and electronic medical records.7 These studies have become increasingly popular over the past 20 years because they are relatively inexpensive and can be conducted in a relatively short period of time. With millions of Americans enrolled in commercial and government programs that rely on the electronic exchange of healthcare data, finding sufficient numbers of patients who have been exposed to the treatment of interest is possible. Furthermore, retrospective studies have the advantage of examining the treatments in real-world settings, not in the artificial environment often created with phase III clinical trials. The major criticism of retrospective cohort studies is the inability to control for disease severity and confounding by indication. Administrative data often contain only diagnosis and procedural codes to measure disease severity. Diagnosis codes are well known for a multitude of issues with respect to disease severity. International Classification of Diseases (ICD) codes can be up to 5 digits, with the fourth and fifth digits occurring after a decimal point. For some diagnoses the absence of a fifth or fourth digit is not relevant; other times, it is. Few diagnosis codes provide any insight into the severity of the condition of interest. In addition, the lack of specificity may result in “rule-out” admissions being misclassified. Coding of conditions also is prone to misspecification, miscoding, incorrect sequencing, and clerical mistakes. ICD codes often fail to reflect whether the condition or etiology had a rapid onset. Procedural ICD codes may permit tracking of sequential procedures.

Confounding by indication is another critical issue that affects retrospective studies treatments. Varying levels of disease severity can result in the use of different treatments or combinations of treatments. Recently, a number of statistical methods have been developed to overcome this limitation. These methods include propensity scoresand instrumental variables.8-10

Propensity scores are used to help select control subjects who are similar to the treated subjects, thereby allowing differences in outcome to be attributed to the treatment or intervention. Propensity scoring, essentially a more complex approach to matching, is becoming widely accepted and is considered to be a valid approach in evaluating treatments, especially as the treatment relates to harm. See, for example, Johannes et al and Seeger et al.11,12

Because unmeasured factors such as extent of disease could affect both the outcome and use of interacting therapies, instrumental-variables methods can be used. Numerous instrumental-variables studies have assessed the effects of different treatment rates across patients grouped by instrumental variables or “instruments.” The use of instrumental variables is appropriate in situations where it is not possible to assess causal relationships through other types of experiments. Instrumental variables are strongly related to the exposure (ie, treatment), but not related directly to the outcome.10 The selection of instrumental variables requires solid statistical and clinical knowledge. Therefore, this technique currently is less common than propensity score methods.

The third study design relevant to CER is evidence synthesis. The vast majority of these analyses are meta- analyses, which combine multiple studies that use similar constructs or outcomes to evaluate a particular therapy. Critics of meta-analyses often complain of combining apples and oranges.13 These differences can be due to differences in severity of disease across trials, differences in inclusion/exclusion criteria, or approaches to measuring the outcomes of interest, to name a few. To determine whether this problem exists, it is recommended that the analyst assess and report the degree of homogeneity across the studies. Meta-analyses are limited by the trials that have been conducted, as well as the manner in which they are reported. A common issue with reported studies is the failure of the authors to include measures of variability, such as standard deviation or standard error, for the outcome measures. Another limitation of meta-analyses is that because they are based on existing studies with placebo arms, there often is little information about comparative effectiveness. Because of this issue, traditional meta-analyses may fall short of providing useful comparative effectiveness data.

An alternative to the traditional meta-analysis is the Bayesian indirect treatment comparison approach.14 Although many clinicians and decision makers may not fully understand the details of this approach, the results from such an analysis are informative. The concept of indirect comparisons has been in the medical literature since 1994, when O’Brien et al evaluated enoxaparin versus warfarin prophylaxis for deep vein thrombosis after hip replacement.15 Bayesian indirect analysis is similar to other statistical methods in which differences in effect between treatments are evaluated using multiple independent- variable regression models. Bucher et al provide a simple illustration of this approach.16 The validity of this type of analysis appears to depend on the comparability of the original clinical trials.17,18 Indirect comparisons and meta-analyses share this limitation. The key advantageof the Bayesian approach is that the results can be rankordered. 14 Those treatments ranked first would have the highest likelihood of achieving treatment success. This approach is appealing because it provides decision makers with knowledge that can be acted on directly. It is important to keep in mind that Bayesian analyses are not static, meaning that as new information becomes available (ie, new studies are published), the analysis can be rerun and the results evaluated to determine whether a different decision should be made.

Certainty in healthcare is a rare commodity. Therefore, health professionals are forced to make decisions about treatment selection that may or may not benefit the patient. There also is no assurance that patients will experience positive outcomes. Not every treatment works in every patient, but some treatments may have a higher probability of success. At the population level, CER should help inform policies and incentives that result in improvements in health and minimize risk. However, CER is not a panacea for all healthcare decision making, because results may differ depending on the methodology used to answer the question.19 Rather, CER should be thought of as a tool to assist and inform decision making.