COMBINING DISTRIBUTION- AND ANCHOR-BASED APPROACHES TO DETERMINE MINIMALLY IMPORTANT DIFFERENCES The FACIT Experience

10.1177/0163278705275340 Evaluation & the Health Professions / June 2005 Yost, Eton / MIDS—THE FACIT EXPERIENCE Health-related quality of life (HRQOL...
Author: Neal Black
3 downloads 0 Views 96KB Size
10.1177/0163278705275340 Evaluation & the Health Professions / June 2005 Yost, Eton / MIDS—THE FACIT EXPERIENCE

Health-related quality of life (HRQOL) is an important endpoint in cancer clinical trials and in cancer treatment in general; however, the meaningfulness of HRQOL scores may not be apparent to clinicians or researchers. Minimally important differences (MIDs) can enhance the interpretability of HRQOL scores by identifying differences likely to be meaningful to patients and clinicians. This article’s objective is to describe and provide examples of approaches we have used to identify MIDs for instruments in the Functional Assessment of Chronic Illness Therapy (FACIT) measurement system. Distribution- and anchor-based approaches are described and illustrated. We also discuss the importance of assessing the appropriateness of anchors, and we provide suggestions for combining results into a single range of plausible MIDs. MIDs for FACIT instruments established to date are summarized, and general guidelines that can be used to estimate MIDs for other FACIT instruments are provided. Applications of MIDs in research are illustrated. Keywords: health-related quality of life; cancer; minimally important difference; clinical significance

COMBINING DISTRIBUTION- AND ANCHOR-BASED APPROACHES TO DETERMINE MINIMALLY IMPORTANT DIFFERENCES The FACIT Experience KATHLEEN J. YOST DAVID T. ETON

Evanston Northwestern Healthcare Research Institute

AUTHORS’ NOTE: Data presented herein were collected and/or analyzed in previous research supported by the following institutions: Agency for Healthcare Research and Quality (R01 HS09869), National Cancer Institute (N01-PC-35136, CA49883, CA13650, CA32102, CA16116, CA61679, CA51926 CA23318, CA66636, CA17145, and CA49957), Eastern Cooperative Oncology Group (NC I grant C A21115), AstraZeneca, Aventis Pharmaceuticals, Genentech Inc., Novartis Pharmaceuticals, and Ortho-Biotech Inc.

EVALUATION & THE HEALTH PROFESSIONS, Vol. 28 No. 2, June 2005 172-191 DOI: 10.1177/0163278705275340 © 2005 Sage Publications

172

Yost, Eton / MIDS—THE FACIT EXPERIENCE

173

T

he field of health-related quality of life (HRQOL) measurement has gone through several phases of development; the first was defining and conceptualizing the term health-related quality of life (HRQOL; Cella, 1994; de Haes, 1988; Ferrell, Dow, Leigh, Ly, & Gulasekaram, 1995; Wilson & Cleary, 1995). When definitions were established, focus turned to developing tools to measure HRQOL. Many instruments now exist to measure HRQOL in a variety of patient populations and health-care delivery settings (MAPI Research Institute, 2004) including cancer (Aaronson et al., 1993; Cella et al., 1993). Although HRQOL is currently recognized as an important endpoint in cancer clinical trials, and in cancer treatment in general, the meaningfulness of HRQOL scores may not be apparent to patients, clinicians, or researchers. Thus, we now find ourselves in the third phase of development—interpreting HRQOL scores. One way to enhance the interpretability of HRQOL scores is to identify score differences that can be considered meaningful. Guyatt and colleagues (Guyatt, Osoba, Wu, Wyrwich, & Norman, 2002) defined a minimally important difference (MID) on an HRQOL measure as the “smallest difference in score in the domain of interest that patients perceive as important, either beneficial or harmful, and which would lead the clinician to consider a change in the patient’s management” (p. 377). Implicit within this definition is that the MID represents the smallest score difference on an HRQOL measure that is clinically significant and likely to be meaningful to patients and clinicians. Considerable work has been done in recent years to identify MIDs for scales and subscales from several HRQOL instruments in the Functional Assessment of Chronic Illness Therapy (FACIT) measurement system (Cella, 2004). To date, MIDs have been established for 19 scales and subscales in the FACIT measurement system. The purpose of this article is to describe and illustrate approaches we have used to identify MIDs for FACIT questionnaires. Examples are taken from our previously published work.

THE FUNCTIONAL ASSESSMENT OF CHRONIC ILLNESS THERAPY (FACIT)

The FACIT measurement system is a collection of HRQOL questionnaires targeted to the management of chronic illness (Cella,

174

Evaluation & the Health Professions / June 2005

2004). At the core of the measurement system is the 27-item Functional Assessment of Cancer Therapy–General (FACT-General) (Cella et al., 1993), which measures four domains: Physical Well-Being (PWB, 7 items), Social/Family Well-Being (SWB, 7 items), Emotional Well-Being (EWB, 6 items), and Functional Well-Being (FWB, 7 items). All FACIT items have a five-category response ranging from 0 (not at all) to 4 (very much). The core measure can be supplemented with Additional Concerns subscales, which contain disease-, treatment-, or condition-specific items. For example, the FACT–Breast Cancer questionnaire consists of the 27-item FACT-General plus a 9item subscale addressing concerns specific to patients with breast cancer such as arm swelling and tenderness (Brady et al., 1997). A Total scale score for an instrument is derived by summing the four general domains (PWB, SWB, EWB, and FWB) and the Additional Concerns subscale. Trial Outcome Index (TOI) scores can be derived as the sum of the PWB, FWB, and an Additional Concerns subscale. TOIs have proven to be valuable summary measures of physical function and well-being, particularly in clinical trials. Additional Concerns subscales vary in number of items; therefore, possible TOI and Total score ranges are not constant across FACIT questionnaires. FACIT questionnaires have been shown to be reliable, valid, and responsive to change in clinical and observational settings. For more information on the FACIT measurement system see Webster, Cella, and Yost (2003) or www.facit.org.

APPROACHES USED TO IDENTIFY THE MID

MIDs can be determined using distribution-based and anchorbased methods (Crosby, Kolotkin, & Williams, 2003; Lydick & Epstein, 1993). Distribution-based measures are based on statistical properties of the scale and include effect size measures (Cohen, 1988; Deyo, Diehr, & Patrick, 1991; Kazis, Anderson, & Meenan, 1989; Sloan et al., 2003), the standard error of measurement (SEM) (Wyrwich, Nienaber, Tierney, & Wolinsky, 1999; Wyrwich, Tierney, & Wolinsky, 1999), the responsiveness index (Guyatt, Walter, & Norman, 1987) and the reliable change index (Jacobson & Truax, 1991). Anchor-based methods anchor or map score differences onto differences in clinical measures, which can be objective measures, such as

Yost, Eton / MIDS—THE FACIT EXPERIENCE

175

response to treatment, or subjective measures, such as patientreported global ratings of change in health status. Anchor-based differences in HRQOL scores can be determined either cross-sectionally at a single time point or longitudinally across multiple time points. No single method for identifying MIDs is generally accepted; therefore, using multiple strategies simultaneously has been recommended (Guyatt, Osoba, et al., 2002). Because MID estimates may vary slightly across patients and possibly across patient groups, reporting a range of plausible MIDs rather than a single number is preferred (Guyatt, Osoba, et al., 2002; Hays & Woolley, 2000). We have used a combination approach with FACIT scales that synthesizes multiple methods of estimating the MID and, where possible, evaluating data for more than one patient group or clinical setting (e.g., metastatic patients in a clinical trial and early-stage patients on adjuvant therapy). We believe that this combination approach produces MIDs that are less likely to be sample specific. DISTRIBUTION-BASED APPROACH

We have utilized effect sizes and 1.0 SEM to identify distributionbased estimates of MIDs for FACIT scales. An effect size is computed by dividing a score difference by the overall standard deviation (SD; effect size = [X1 – X2]/SD). Cohen (1988) recommended cutoffs to aid the interpretation of effect sizes, where an effect size of .2 (i.e., one fifth SD) is considered small, .5 (one half SD) is considered moderate, and .8 (four fifths SD) is considered large. Because effect sizes are unitless, they are not interpretable in the context of actual HRQOL score differences until they are translated into the units of the original scale (Guyatt, Osoba, et al., 2002). However, the SD is in the units of the original scale; therefore, we use one third SD and one half SD to identify score differences that are associated with effect sizes of .33 and .50, respectively. We consider the lower bound of this effect size range (.33) to be an adequate approximation of an MID because it falls between a small and moderate effect size. Based on the work of Wyrwich and colleagues (Wyrwich, Nienaber, et al., 1999; Wyrwich, Tierney, et al., 1999), we computed the SEM for scales and subscales in the FACIT measurement system using the following formula:

176

Evaluation & the Health Professions / June 2005 TABLE 1

Example of a Distribution-Based Approach for Identifying MIDs (Cella, Eton, Fairclough, et al., 2002) Scale and Subscale

Assessment

SD

1/3 SD

a

1/2 SDa

Cronbach’s alpha

SEMa

Lung Cancer subscale

Baseline Week 12 Baseline to Week 12 change

5.1 4.5 5.2

1.7 1.5 1.7

2.5 2.3 2.6

.68 .64 NA

2.9 2.7 NA

TOI-Lung

Baseline Week 12 Baseline to week 12 change

13.8 13.8 14.5

4.6 4.6 4.8

6.9 6.9 7.3

.87 .89 NA

5.0 4.5 NA

NOTE: MIDs = minimally important differences; TOI = Trial Outcome Index. a. Scores in these columns are estimates of the MID.

SEM = σx 1− rel x

where: σx = the standard deviation of the scale or subscale relx = the reliability of the scale or subscale (test-retest reliability or internal consistency [alpha]).

Reliability and SDs may vary across samples and settings; thus, the SEM has been favored by some because it is considered relatively stable across samples and settings (Guyatt, Osoba, et al., 2002; Wyrwich, Nienaber, et al., 1999; Wyrwich, Tierney, et al., 1999). However, we have also observed some variability in SEMs across settings and samples. Table 1 illustrates distribution-based MID estimates for the Lung Cancer subscale and TOI-Lung based on data for 599 patients with advanced non-small-cell lung cancer participating in a clinical trial (Cella, Eton, Fairclough, et al., 2002). Distribution-based estimates were computed for scores measured at two assessments (baseline and Week 12) and for the baseline to Week 12 change score. The SEM was not computed for the change score. This table illustrates that for scales with high reliability (i.e., ~.9) the SEM is closer to one third SD, whereas for scales with low reliability low (~.7), the SEM is closer to one half SD.

Yost, Eton / MIDS—THE FACIT EXPERIENCE

177

TABLE 2

Example of an Anchor-Based Approach for Identifying MIDs Using Cross-Sectional Data (Eton et al., 2004)

Baseline Pain Level None (0) Nonnarcotic (1) Narcotic (2)

Baseline TOIBreast Score n

M

(SD)

283 166 147

68.8 61.7 56.5

(14.2) (13.9) (14.0)

Adjacent Category Difference a

(0 vs. 1) 7.1 a (1 vs. 2) 5.2

Overall SD at Baseline

Effect Size

14.1

.50 .37

NOTE: MIDs = minimally important differences; TOI = Trial Outcome Index. a. These mean score differences are estimates of the MID.

ANCHOR-BASED APPROACH (CROSS-SECTIONAL)

In a cross-sectional analysis, anchors are used to categorize patients into clinically distinct groups. Any anchor can be used, provided the classification of individuals into distinct categories is clinically relevant. We have found that the best anchors are those that are correlated with HRQOL outcomes. FACIT score differences between adjacent, clinically distinct categories represent estimates of the MID. Effect sizes for these estimates are computed by dividing the adjacent category score difference by the overall SD for the sample. Incorporating effect sizes into our anchor-based analyses is another feature that makes ours a combination approach. This approach was used by Eton et al. (2004) in an evaluation of MIDs for the FACT–Breast Cancer questionnaire. Baseline HRQOL scores were compared across clinically distinct groups of patients with metastatic breast cancer participating in a Phase III clinical trial. Physician-reported pain severity was used as an anchor, where pain severity was based on level of analgesic necessary to control pain: none, nonnarcotic medication required, narcotic medication required. Results are shown in Table 2 for the TOI-Breast, which is a sum of the PWB, FWB, and Breast Cancer subscales (Eton et al., 2004). Mean differences and corresponding effect sizes were determined for adjacent clinical categories (i.e., none vs. nonnarcotic and nonnarcotic vs. narcotic). For example, the difference in mean TOI-Breast scores between patients in the None vs. the Nonnarcotic groups was 7.1 points, and this score difference represents an estimate of the MID.

178

Evaluation & the Health Professions / June 2005

Dividing this mean score difference by the overall SD at baseline yields an effect size of .50. ANCHOR-BASED APPROACH (LONGITUDINAL)

In longitudinal anchor-based analyses, the association between changes over time in clinical status and changes over time in FACIT scores is assessed. The most commonly used anchor-based approach in longitudinal studies utilizes global ratings of change (GRC; Crosby et al., 2003); a method that was developed by Jaeschke, Singer, and Guyatt (1989). We and others have applied the Jaeschke et al. GRC method to the analysis of HRQOL data (Cella, Hahn, & Dineen, 2002; Osoba, Rodrigues, Myles, Zee, & Pater, 1998; Yost et al., in press). The GRC is a retrospective assessment of change. Patients think back to a previous time point and state whether they have experienced change in a domain of health or HRQOL from that time point to the present. A potential problem with retrospective assessments of change is that they may be overly influenced by patients’ current health state (Guyatt, Norman, Juniper, & Griffith, 2002; Yost et al., in press). The original GRC scale had 15 response options ranging from –7 (a very great deal worse) through 0 (no change) to 7 (a very great deal better). We and others, including the original developers of the scale, collapsed data into fewer categories to yield sufficient numbers of cases in each category and to provide more interpretable information. Cella, Hahn, et al. (2002) created five patient groups based on GRC responses: sizeably worse (–5, –6, –7), minimally worse (–2, –3, –4), no change (–1, 0, 1), minimally better (+2, +3, +4), and sizeably better (+5, +6, +7). Patients were stratified by category of change on the GRC, and mean HRQOL change scores within each stratum were computed. HRQOL change scores for patients in the minimally worse and minimally better categories represent estimates of MIDs. An example illustrating the GRC as an anchor for identifying MIDs for the PWB subscale is presented in Table 3a. We have also used prospective anchors to identify MIDs (Cella, Eton, Fairclough, et al., 2002; Eton et al., 2004; Yost et al., 2004; Yost et al., in press). When using a prospective anchor, a patient is rated with respect to the clinical anchor at two time points; for example, once at baseline prior to beginning therapy and again at later followup. The change in the anchor over these two time points can be used to

Yost, Eton / MIDS—THE FACIT EXPERIENCE

179

TABLE 3a

Example of an Anchor-Based Approach for Identifying MIDs Using Longitudinal Data for a Retrospective Anchor (Cella, Hahn, & Dineen, 2002) PWB Change Score GRC Category

n

M

(SD)

Sizeably worse Minimally worse No change Minimally better Sizeably better

9 16 50 20 19

–5.8 a –1.7 –.04 a 2.4 3.2

(7.2) (4.6) (5.6) (4.9) (4.6)

Overall SD at Baseline

5.9

Effect size –.98 –.29 –.01 .41 .55

NOTE: MIDs = minimally important differences; GRC = global ratings change; PWB = physical well-being. a. These mean score differences are estimates of the MID.

assess the meaningfulness of change in FACIT scores over the same time. For example, patients with cancer enrolled in a clinical trial were classified as worse if their performance status rating declined from Month 2 to Month 3 of the trial by at least one category, about the same if it did not change, and better if it improved by at least one category. Mean changes in TOI–Biological Response Modifier scores (PWB + FWB + Biological Response Modifier subscale) over the same time period for patients in the worse or better categories represented estimates of MIDs (Yost et al., in press). Table 3b illustrates this approach. Defining clinically meaningful cut-points for anchors measured on a continuum (e.g., hemoglobin levels) rather than in categories (e.g., performance status rating) can be challenging; thus, we have found it helpful to consult with clinical experts when making these determinations. ASSESSING THE USEFULNESS OF ANCHORS

One measure of the usefulness of an anchor is the correlation between HRQOL scores and the anchor (Crosby et al., 2003; Guyatt, Osoba, et al., 2002). Using inappropriate anchors (i.e., anchors that are poorly correlated with HRQOL) may yield MID estimates that are too small. Thus, a recent amendment to our combination approach is to first assess the usefulness of an anchor prior to using it to establish MIDs for a scale or subscale. We feel that in a cross-sectional analysis,

180

Evaluation & the Health Professions / June 2005 TABLE 3b

Example of an Anchor-Based Approach for Identifying MIDs Using Longitudinal Data for a Prospective Anchor (Yost et al., in press)

Change in Performance Status Rating b

Worse c About the same d Better

TOI–Biological Response Modifier Change Score n 22 108 16

a

M

(SD)

Overall SD a at Month 2

e

–5.5 (10.9) .7 (10.7) e 9.9 (21.4)

21.0 .47

Effect Size –.26 .03

NOTE: MIDs = minimally important differences; TOI = Trial Outcome Index. a. Change was measured from Month 2 to Month 3 of a clinical trial. b. Declined by at least one category. c. No change. d. Improved by at least one category. e. These mean score differences are estimates of the MID.

the anchor and HRQOL scores should be linearly related and have at least a moderate positive correlation. For longitudinal analyses, we feel that the anchor change scores and the HRQOL change scores should be linearly related and have at least a moderate positive correlation. We use Cohen’s (1988) guidelines for interpreting the magnitude of correlation coefficients, where r = .1 is small, r = .3 is moderate, and r = .5 is large. We assessed the usefulness of anchors in recent MID analyses for the FACT–Biological Response Modifiers (Yost et al., in press) and FACT–Colorectal (Yost et al., 2004) questionnaires. Figure 1 shows the relationship between change scores on the FACT–Colorectal and changes in the general health anchor for 568 patients with colorectal cancer participating in an observational study (Yost et al., 2004). The relationships between the anchor change scores and the FACT–C change scores are linear. Correlations between anchor change scores and Total FACT–Colorectal and TOI–Colorectal change scores were moderate (i.e., greater than .30), whereas that for the Colorectal Cancer subscale was small (only .16). Based on these data, we concluded that general health met both of our criteria for a suitable anchor for assessing change in total FACT–Colorectal and the TOI–Colorectal scores, but that it only met one criterion for a suitable anchor for assessing change in the Colorectal Cancer subscale scores. Based on these results, we did not use general health to assess meaningful change in the Colorectal Cancer subscale.

Yost, Eton / MIDS—THE FACIT EXPERIENCE

181

14 T otal FACT -Colorectal (r=.35) T OI-Colorectal (r=.33)

12 10

Colorectal Cancer Subscale (r=.16)

Mean change in HRQOL Score

8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 Much Worse

Worse

Same

Bett er

Much Bett er

Change in General Health

Figure 1: Relationship Between HRQOL Change Scores and Change in Clinical Anchor Yost et al. (2004) NOTE: FACT = Functional Assessment of Cancer Therapy; TOI = Trial Outcome Index; HRQOL = health-related quality of life.

SUMMARIZING DISTRIBUTION- AND ANCHOR-BASED RESULTS TO DERIVE A RANGE OF RECOMMENDED MIDS

We began using graphical displays of MID estimates to facilitate identifying a recommended range of MIDs for a given FACIT scale or subscale. The different types of estimates (i.e., distribution- and anchor-based) are plotted on the same graph. If multiple values of any one type of estimate are available, the mean or the median is plotted. Figure 2 shows the summary of MIDs based on data from Ward et al. (1999) for three scales from the FACT-Colorectal. In addition to the Ward et al. (1999) data, we also evaluated data for samples of patients with colorectal cancer from a Phase II clinical trial and an observational study. Graphically summarizing the data for these three samples helped us identify MIDs of 2 to 3 points for the Colorectal Cancer subscale, 4 to 6 points for the TOI-Colorectal, and 5 to 8 points for the FACT–Colorectal total (Yost et al., 2004). In choosing the high or low end of the range as the criterion for interpreting HRQOL scores, one should consider the consequences of

182

Evaluation & the Health Professions / June 2005

10 1/3 Standard Deviation 9 8

MID Estimates

7

1/2 Standard Deviation Anchor-based, cross-sectional Anchor-based, longitudinal

6

7.2

5.9

5.9

4.7 4.5

4

2

8.9

6.8 5.8

5

3

8.9

Standard Error of Measurement

2.6

2.8 2.5 1.7

1 Colorectal cancer subscale

TOI-Colorectal

FACT-Colorectal total

Figure 2: Summary of Distribution- and Anchor-Based MIDs for the FACT-Colorectal Based on data from Ward et al. (1999) and Yost et al., (2004) NOTE: MIDs = minimally important differences; FACT = Functional Assessment of Cancer Therapy; TOI = Trial Outcome Index.

false positives (i.e., concluding a difference is meaningful when it is not) and false negatives (i.e., concluding a difference is not meaningful when it is) in a specific patient group or clinical setting. The choice also depends on whether change in an individual patient or average change in a group of patients is of interest. Because of the measurement error inherent in HRQOL scores for an individual patient, one might be inclined to select the high end of the MID range to interpret score differences for a single patient, whereas the low end of the range could be selected for interpreting group differences. Similarly, one could use the range of MID estimates in a sensitivity analysis.

SUMMARY OF ESTABLISHED MIDS FOR FACIT INSTRUMENTS ESTABLISHED MIDS

To date, MIDs have been established for 19 scales and subscales in the FACIT measurement system. These are summarized in Table 4.

183

FACT–Biological Response Modifiers

FACT–Anemia

FACT–General

Instrument

TABLE 4

47

Total FACT–Anemia 27

20

TOI–Biological Response Modifiers

27

Total FACT–General

TOI–Anemia

7 27

Functional Well-Being

TOI–Fatigue

7

Social/Family Well-Being

13

6

Emotional Well-Being

Fatigue subscale

7

No. Items

Physical Well-Being

Scale or Subscale

108

188

80

108

52

108

28

28

24

28

Total Score

5 to 7

7

6

5

3 to 4

3 to 7

2 to 3

2 to 3

2 to 3

2 to 3

MID (Points)

.19 to .26

.15

.30

.19

.23 to .31

.15 to .26

.29 to .43

.29 to .43

.33 to .50

.29 to .43

MID in Points per Item

4.6% to 6.5%

3.7%

7.5%

4.6%

5.8% to 7.7%

3.7% to 6.5%

7.1% to 10.7%

7.1% to 10.7%

8.3% to 12.5%

7.1% to 10.7%

MID in % of Total Score Reference

(Yost et al., in press) (continued)

(Cella, Eton, Lai, et al., 2002)

(Cella, Eton, Lai, et al., 2002)

(Cella, Eton, Lai, et al., 2002)

(Cella, Eton, Lai, et al., 2002; Patrick et al., 2003)

(Cella, Eton, Lai, Peterman, & Merkel, 2002; Cella, Hahn, Dineen, 2002; Eton et al., 2004; Patrick, Gagnon, Zagari, Mathijs, & Sweetenham, 2003)

(Cella, Hahn, & Dineen, 2002)

(Yost et al., in press)

(Cella, Hahn, & Dineen, 2002; Yost et al., in press)

(Cella, Hahn, & Dineen, 2002)

Summary of Established MIDs for FACIT Scales and Subscales

184 TOI–Lung

84

28

148

136

84

28

144

92

36

Total Score

5 to 6

2 to 3

6 to 12

5 to 8

4 to 6

2 to 3

7 to 8

5 to 6

2 to 3

MID (Points)

.24 to .29

.29 to .43

.16 to .32

.15 to .24

.19 to .29

.29 to .43

.19 to .22

.22 to .26

.22 to .33

MID in Points per Item

6.0% to 7.1%

7.1% to 10.7%

4.1% to 8.1%

3.7% to 5.9%

4.8% to 7.1%

7.1% to 10.7%

4.9% to 5.6%

5.4% to 6.5%

5.6% to 8.3%

MID in % of Total Score

(Cella, Eton, Fairclough, et al., 2002)

(Cella, Eton, Fairclough, et al., 2002)

(Ringash, Bezjak, O’Sullivan, & Redelmeier, 2004)

(Yost et al., 2004)

(Yost et al., 2004)

(Yost et al., 2004)

(Eton et al., 2004)

(Eton et al., 2004)

(Eton et al., 2004)

Reference

NOTE: MIDs = minimally important differences; TOI = Trial Outcome Index; FACIT = Functional Assessment of Chronic Illness Therapy; FACT = Functional Assessment of Cancer Therapy.

7 21

Lung Cancer subscale

FACT–Lung

37

34

Total FACT–Colorectal Total FACT–Head & Neck

7 21

TOI–Colorectal

36

Total FACT–Breast Colorectal Cancer subscale

9 23

TOI–Breast

No. Items

Breast Cancer subscale

Scale or Subscale

FACT–Head

FACT–Colorectal

FACT–Breast

Instrument

TABLE 4 (continued)

Yost, Eton / MIDS—THE FACIT EXPERIENCE

185

GENERAL GUIDELINES

MIDs for FACIT scales and subscales are fairly stable across patient populations. This can be illustrated by comparing established MIDs for various scales with respect to corresponding points per item differences. For example, the MID for the 7-item Lung Cancer subscale is 2 to 3 points (Cella, Eton, Fairclough, et al., 2002). Dividing the MID by the number of items yields an average of .29 to .43 points per item (all individual FACIT items have a possible score range of 0 to 4). We repeated these calculations for the instruments listed in Table 4. We computed the mean and median of the lower and upper bounds of the range of points per item across the different scales and subscales. We found that for MIDs for cancer-specific subscales, the mean and median for the lower bound were approximately .30, and for upper bound they were approximately .40. Thus, a general MID guideline for cancer-specific subscales is .30 to .40 points per item. A general guideline for TOIs of .20 to .30 points per item was identified. Finally, a general guideline for FACIT Total scales (i.e., the FACT–General alone or FACT–General plus a disease-, treatment-, or condition-specific subscale) is .15 to .25 points per item (Yost et al., 2004). Another way to illustrate the stability of MIDs is by computing the percentage of the total scale score represented by the MID (i.e., dividing the MID by the highest possible scale score). Because the lowest possible score on every scale or subscale is 0, dividing the MID by the highest possible score is equivalent to dividing it by the possible score range. For example, the MID for the Total FACT–Breast scale is 7 to 8 points, which represents 4.9% to 5.6% of the total score of 144 points. The percentage of score represented by other MIDs is summarized in Table 4. Again, we evaluated the means and medians for the lower and upper bounds of the percentage of score across the scales and subscales, and we determined general guidelines for MIDs of 7% to 11% of the total score for cancer-specific subscales, 5% to 7% for TOIs, and 4% to 6% for Total FACT scale scores. It is important to note that these general guidelines are applicable to FACIT instruments only, and they may change on the receipt of additional information. The points per item and percentage of range estimates differ by the type of scale (e.g., cancer-specific subscale vs. TOI), which is understandable given the content of the scales. The TOIs and to a greater

186

Evaluation & the Health Professions / June 2005

extent the Total FACT scales blend multiple dimensions of HRQOL into single summary scores; however, not all of the subscales that make up a summary scale will behave similarly with respect to clinical criteria. For example, the Total FACT–Colorectal is a sum of the PWB, SWB, EWB, FWB, and Colorectal Cancer (CCS) subscales. As one might expect, the PWB, FWB, and CCS subscales are moderately or strongly correlated with bowel function, whereas EWB and SWB subscales are weakly correlated with bowel function. Thus, anchor-based estimates of the score differences for the Total FACT– Colorectal determined using bowel function as the anchor will be slightly diminished because of the presence of the SWB and EWB in the Total scale (Vickers, 2004). The general guidelines can be used by clinicians or researchers interested in interpreting scores for a FACIT instrument that has not undergone a thorough analysis to identify MIDs, as shown in Table 5. In this example, general guidelines based on points per item and on percentage of score range yielded similar estimates of the MID for the FACT–Cervical.

APPLICATIONS OF MIDS

Perhaps the most apparent use of MIDs is for interpreting HRQOL score differences. MIDs can help investigators and clinicians understand whether HRQOL score differences between two treatment groups are meaningful, or if changes within one group over time are meaningful. A recent example can be found in Hahn et al. (2003), who used MIDs to interpret the results from a clinical trial and concluded that TOI–Biological Response Modifier scores in one treatment arm of a clinical trial were statistically significantly and clinically meaningfully higher (i.e., better physical function and well-being) than in the other treatment arm. They also used the MID of 5 points (Yost et al., in press) to classify patients into three categories based on their TOI–Biological Response Modifier change scores. For example, patients were classified as having had a clinically relevant improvement if their TOI–Biological Response Modifier score increased from baseline by 5 or more points. Patients had a clinically relevant decline if their scores decreased 5 or more points from baseline, and they were considered unchanged if their score increased or decreased less

187

15 29 44

No. Items 60 116 176

Total Score .30 to .40 .20 to .30 .15 to .25

General Guideline (Points Per Item) 4.5 to 6.0 5.8 to 8.7 6.6 to 11.0

MID Based on Points Per Item

7% to 11% 5% to 7% 4% to 6%

General Guideline (% of Total Score)

NOTE: MIDs = minimally important differences; TOI = Trial Outcome Index FACT = Functional Assessment of Cancer Therapy.

Cervical Cancer subscale TOI–Cervical Total FACT–Cervical

Scale/Subscale

TABLE 5

4.2 to 6.6 5.8 to 8.1 7.0 to 10.6

MID Based on % of Total Score

Example Illustrating the Use of General Guidelines to Estimate MIDs for the FACT-Cervical

188

Evaluation & the Health Professions / June 2005

than 5 points from baseline. The percentages of patients in each treatment arm experiencing clinically relevant declines or clinically relevant improvements in TOI–Biological Response Modifier scores were reported (Hahn et al., 2003). HRQOL results presented in this manner may be useful to clinicians and patients faced with making treatment decisions. MIDs are also useful for estimating sample size or power for a future study. For example, suppose a researcher was interested in detecting a meaningful difference of five or more points in mean TOI– Breast scores between two groups (see Table 4). For a two-sided independent sample t test with 95% confidence and 80% power, and assuming a standard deviation for the TOI–Breast of 12.5 (Brady et al., 1997), the researcher would need 100 patients per group (Elashoff, 2000).

CONCLUSIONS

We have illustrated a methodology that combines distribution- and anchor-based approaches to determine minimally important score differences on health outcome assessments from the FACIT measurement system. We reviewed results from our previous studies in which MIDs were estimated. These studies provided evidence that MIDs for FACIT scales and subscales are fairly stable across patient populations and settings. This stability justified formulating general guidelines for estimating MIDs on other FACIT scales and subscales. This is an important step as it allows clinical investigators to set a priori benchmarks for treatment efficacy and establish appropriate study sample sizes without performing extensive preliminary analyses. Although such guidelines are valuable when limited self-report data are available, greater precision in MID estimation can always be obtained from more thorough analyses, as illustrated by the examples in this article. Although considerable progress has been made to enhance the interpretability of HRQOL scores, additional research is needed to better understand the nuances related to establishing and using MIDs for the purpose of interpreting HRQOL data for a group of patients versus interpreting HRQOL data for an individual patient. Answers to these questions, along with our current knowledge on interpreting

Yost, Eton / MIDS—THE FACIT EXPERIENCE

189

HRQOL, will allow us to confidently move forward to future phases of HRQOL research, such as evaluating the impact on patient care of incorporating assessments of meaningful HRQOL differences into clinical practice.

REFERENCES Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., et al. (1993). The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-oflife instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85(5), 365-376. Brady, M. J., Cella, D. F., Mo, F., Bonomi, A. E., Tulsky, D. S., Lloyd, S. R., et al. (1997). Reliability and validity of the Functional Assessment of Cancer Therapy–Breast quality-of-life instrument. Journal of Clinical Oncology, 15(3), 974-986. Cella, D. (1994). Quality of life: Concepts and definitions. Journal of Pain and Symptom Management, 9(3), 186-192. Cella, D. (Ed.). (2004). Manual of the Functional Assessment of Chronic Illness Therapy (FACIT) measurement system, Version 4.1. Evanston, IL: Center on Outcomes, Research and Education (CORE) Evanston Northwestern Healthcare and Northwestern University. Cella, D., Eton, D. T., Fairclough, D. L., Bonomi, P., Heyes, A. E., Silberman, C., et al. (2002). What is a clinically meaningful change on the Functional Assessment of Cancer Therapy Lung (FACT-L): Results from the Eastern Cooperative Oncology Group (ECOG) Study 5592. Journal of Clinical Epidemiology, 55, 285-295. Cella, D., Eton, D. T., Lai, J. S., Peterman, A., & Merkel, D. E. (2002). Combining anchor and distribution based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) Anemia and Fatigue scales. Journal of Pain and Symptom Management, 24(6), 547-561. Cella, D., Hahn, E. A., & Dineen, K. (2002). Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening. Quality of Life Research, 11(3), 207-221. Cella, D., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., et al. (1993). The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. Journal of Clinical Oncology, 11(3), 570-579. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Crosby, R. D., Kolotkin, R. L., & Williams, G. R. (2003). Defining clinically meaningful change in health-related quality of life. Journal of Clinical Epidemiology, 56(5), 395-407. de Haes, J. C. (1988). Quality of life: Conceptual and theoretical considerations. In M. Watson, S. Greer, & C. Thomas (Eds.), Psychosocial oncology (pp. 61-70). Oxford, UK: Pergamon. Deyo, R. A., Diehr, P., & Patrick, D. L. (1991). Reproducibility and responsiveness of health status measures: Statistics and strategies for evaluation. Controlled Clinical Trials, 12(4 Suppl), 142S-158S. Elashoff, J. D. (2000). nQuery Advisor (4.0 ed.). Cork, Ireland: Statistical Solutions. Eton, D. T., Cella, D., Yost, K. J., Yount, S. E., Peterman, A. H., Neuberg, D. S., et al. (2004). A combination of distribution- and anchor-based approaches determined minimally important differences (MIDs) for four endpoints in a breast cancer scale. Journal of Clinical Epidemiology, 57(9), 898-910.

190

Evaluation & the Health Professions / June 2005

Ferrell, B. R., Dow, K. H., Leigh, S., Ly, J., & Gulasekaram, P. (1995). Quality of life in long-term cancer survivors. Oncology Nursing Forum, 22(6), 915-922. Guyatt, G. H., Norman, G. R., Juniper, E. F., & Griffith, L. E. (2002). A critical look at transition ratings. Journal of Clinical Epidemiology, 55(9), 900-908. Guyatt, G. H., Osoba, D., Wu, A. W., Wyrwich, K. W., & Norman, G. R. (2002). Methods to explain the clinical significance of health status measures. Mayo Clinic Proceedings, 77(4), 371-383. Guyatt, G. H., Walter, S., & Norman, G. R. (1987). Measuring change over time: Assessing the usefulness of evaluative instruments. Journal of Chronic Diseases, 40(2), 171-178. Hahn, E. A., Glendenning, G. A., Sorensen, M. V., Hudgens, S. A., Druker, B. J., Guilhot, F., et al. (2003). Quality of life in patients with newly diagnosed chronic phase chronic myeloid leukemia on imatinib versus interferon alfa plus low-dose cytarabine: Results from the IRIS Study. Journal of Clinical Oncology, 21(11), 2138-2146. Hays, R. D., & Woolley, J. M. (2000). The concept of clinically meaningful difference in healthrelated quality-of-life research: How meaningful is it? Pharmacoeconomics, 18(5), 419-423. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12-19. Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of health status: Ascertaining the minimal clinically important difference. Controlled Clinical Trials, 10(4), 407-415. Kazis, L. E., Anderson, J. J., & Meenan, R. F. (1989). Effect sizes for interpreting changes in health status. Medical Care, 27(3 Suppl), S178-S189. Lydick, E., & Epstein, R. S. (1993). Interpretation of quality of life changes. Quality of Life Research, 2(3), 221-226. MAPI Research Institute. (2004). Quality of Life Instrument Database (QOLID). Lyon, France: MAPI Research Trust. Available at www.qolid.org/ Osoba, D., Rodrigues, G., Myles, J., Zee, B., & Pater, J. (1998). Interpreting the significance of changes in health-related quality-of-life scores. Journal of Clinical Oncology, 16(1), 139144. Patrick, D. L., Gagnon, D. D., Zagari, M. J., Mathijs, R., & Sweetenham, J. (2003). Assessing the clinical significance of health-related quality of life (HRQOL) improvements in anaemic cancer patients receiving epoetin alfa. European Journal of Cancer, 39(3), 335-345. Ringash, J., Bezjak, A., O’Sullivan, B., & Redelmeier, D. A. (2004). Interpreting differences in quality of life: The FACT-H&N in laryngeal cancer patients. Quality of Life Research, 13(4), 725-733. Sloan, J., Vargas-Chanes, D., Kamath, C., Sargent, D., Novotny, P., Atherton, P., et al. (2003). Detecting worms, ducks and elephants: A simple approach for defining clinically relevant effects in quality-of-life measures. Journal of Cancer Integrative Medicine, 1(1), 41-47. Vickers, A. J. (2004). Statistical considerations for use of composite health-related quality-oflife scores in randomized trials. Quality of Life Research, 13(4), 717-723. Ward, W. L., Hahn, W. A., Mo, F., Hernandez, L., Tulsky, D. S., & Cella, D. (1999). Reliability and validity of the Functional Assessment of Cancer Therapy-Colorectal (FACT-C) quality of life instrument. Quality of Life Research, 8(3), 181-195. Webster, K., Cella, D., & Yost, K. (2003). The Functional Assessment of Chronic Illness Therapy (FACIT) measurement system: Properties, applications, and interpretation. Health and Quality of Life Outcomes, 1, 79. Wilson, I. B., & Cleary, P. D. (1995). Linking clinical variables with health-related quality of life: A conceptual model of patient outcomes. Journal of the American Medical Association, 273(1), 59-65.

Yost, Eton / MIDS—THE FACIT EXPERIENCE

191

Wyrwich, K. W., Nienaber, N. A., Tierney, W. M., & Wolinsky, F. D. (1999). Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Medical Care, 37(5), 469-478. Wyrwich, K. W., Tierney, W. M., & Wolinsky, F. D. (1999). Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. Journal of Clinical Epidemiology, 52(9), 861-873. Yost, K. J., Cella, D., Chawla, A., Holmgren, E., Eton, D. T., Ayanian, J. Z., et al. (2004). Estimating minimally important differences in the Functional Assessment of Cancer TherapyColorectal (FACT-C). Manuscript submitted for publication. Yost, K. J., Sorensen, M. V., Hahn, E. A., Glendenning, G. A., Gnanasakthy, A., & Cella, D. (in press). Using multiple anchor- and distribution-based estimates to evaluate clinically meaningful change on the Functional Assessment of Cancer Therapy - Biologic Response Modifiers (FACT - BRM) instrument. Value in Health.

Suggest Documents