Anda di halaman 1dari 7

2270

ORIGINAL ARTICLE

The Validity of Prospective and Retrospective Global Change


Criterion Measures
John Schmitt, PhD, PT, Richard P. Di Fabio, PhD, PT
ABSTRACT. Schmitt J, Di Fabio RP. The validity of pro- ESPONSIVENESS TO CLINICALLY important change
spective and retrospective global change criterion measures.
Arch Phys Med Rehabil 2005;86:2270-6.
R over time is an essential characteristic of an instrument
1-4
designed to assess patients’ response to treatment. There is
no agreement in the literature on a method to determine re-
Objective: To assess the validity of retrospective versus sponsiveness. A critical challenge is to identify a group that has
prospective criterions of change. truly changed. A randomized controlled trial using a treatment
Design: Single cohort pretest-posttest design. of known efficacy is the strongest design to accomplish this
Setting: Physical or occupational therapy outpatient clinics. purpose because it provides for the differentiation of “changed”
Participants: Volunteer sample of 211 patients with upper- from stable patients. However, drawbacks of this approach
extremity musculoskeletal problems. include the requirement for treatments of proven efficacy for a
Interventions: Not applicable. given condition, and the ethical considerations of withholding
Main Outcome Measures: Disabilities of the Arm, Shoul- such efficacious treatment from a placebo group simply to
der, and Hand questionnaire, the Shoulder Pain and Disability determine the responsiveness of an outcome measure.5 Instead,
Index, the Patient-Rated Wrist Evaluation, the Medical Out- an observational design is often employed, using an external
comes Study 12-Item Short-Form Health Survey; global dis- criterion to determine change status.
ability rating (GDR), retrospective global rating of change There is no accepted criterion to ascertain true change sta-
(GRC), and patient satisfaction. tus.5-8 One approach is to utilize a retrospective global rating of
Results: Correlations were calculated among the baseline, change instrument, consisting of a single question about degree
3-month follow-up, and change scores for each outcome mea- of change since baseline.5,9-11 The retrospective global rating
sure with the change criterion instruments. Retrospective GRC serves as an external criterion to differentiate between changed
and unchanged patients at the end of a longitudinal study. It has
and patient satisfaction ratings showed moderate correlations
been described as a “credible alternative” in the absence of a
with the 3-month follow-up scores, but nonsignificant correla- criterion standard for change in functional status.12 Retrospec-
tions with baseline scores. By contrast, the prospective GDR tive ratings are also known as “transitional” instruments that
criterion showed significant correlations with both baseline and require the respondent to compare current status with baseline
3-month follow-up scores ranging between 0.3 and 0.4 (abso- status and judge whether change has taken place (yes/no) and,
lute value). if so, how much change.
Conclusions: Retrospective self-report measures of change Norman et al13 have challenged the use of such global
do not accurately reflect true change over time. The retrospec- ratings of change (GRC) to categorize improvement. These
tive GRC and patient satisfaction were heavily influenced by authors point out that retrospective judgments require compar-
current (posttreatment) status whereas the prospective global ison of a current state to that experienced weeks or months
change measure reflected both baseline and posttreatment sta- previously. This is a difficult recall task for patients. Ross14 has
tus equally and thus appeared to be a more valid measure of discussed the potential for recall bias related to recent history
change over time. This study demonstrates the need for an using numerous examples from the social science literature.
alternative criterion for establishing true individual change. Rather than an accurate representation of true change, retro-
Key Words: Outcome assessment (health care); Recall; spective ratings are more likely to correlate with the subject’s
Rehabilitation; Treatment effectiveness; Validity of results. current status. These arguments bring into question the validity
© 2005 by the American Congress of Rehabilitation Medi- of responsiveness indices based on retrospective ratings of
cine and American Academy of Physical Medicine and change. Nevertheless, authors continue to use these global
Rehabilitation instruments as a criterion to determine responsiveness8,15-19
and to define treatment success.20,21
We have recently reported22 on a comparison of various
statistics for reporting responsiveness. As an alternative to
retrospective global ratings, we chose to use the change score
for a patient’s global rating of current status, assessed at
From the College of St. Catherine, Minneapolis, MN (Schmitt); and Department of baseline and again 3 months later. Our rationale was to avoid
Physical Medicine & Rehabilitation, School of Medicine, University of Minnesota,
Minneapolis, MN (Di Fabio).
the necessity for patient recall inherent in retrospective global
Presented as a poster to the American Physical Therapy Association Combined ratings. However, few studies have compared the validity of
Sections Meeting, February 2004, Nashville, TN. this approach with other methods for determining a patient’s
Supported in part by the American Physical Therapy Association Orthopaedic change status.
Section.
No commercial party having a direct financial interest in the results of the research
According to Norman,13 a retrospective estimate of change
supporting this article has or will confer a benefit upon the author(s) or upon any is invalid to the extent that it correlates more highly with the
organization with which the author(s) is/are associated. current status of the patient (at follow-up) than with the base-
Reprint requests to Richard P. Di Fabio, PhD, PT, Program in Physical Therapy, line state. A true measure of change would correlate equally in
Mayo Mail Code 388, University of Minnesota, Minneapolis, MN 55455, e-mail:
difab001@umn.edu.
magnitude (but with opposite sign) with the initial and final
0003-9993/05/8612-9785$30.00/0 patient condition.13 In other words, true change would be
doi:10.1016/j.apmr.2005.07.290 equally influenced by baseline and follow-up status. In a 2002

Arch Phys Med Rehabil Vol 86, December 2005


VALIDITY OF CLINICAL CHANGE MEASURES, Schmitt 2271

article, Guyatt et al23 offer a mathematical proof of this equal formed to a 0 to 50 scale for both pain and function. The total
and opposite correlation with a valid transition rating. How- score is the sum of the 2 subscales and may range from 0 to
ever, few studies have used clinical data to test the validity of 100, with higher scores indicating greater pain and disability.
change criterions. In the analysis described in Norman et al,13 Reliability, validity, and responsiveness to change have been
the investigator’s retrospective judgment of change was used as reported in patients recovering from wrist fractures.30,36 The
a criterion standard for comparison with the global ratings. No PRWE was given to the subgroup of patients with conditions
evidence was offered to support the validity of the investiga- affecting the distal upper extremity.
tor’s ratings. Further research is needed to explore the validity Medical Outcomes Study 12-Item Short-Form Health Sur-
of other possible criterion measures for change. The purpose of vey. The Medical Outcomes Study 12-Item Short-Form
this study was to compare the validity of retrospective mea- Health Survey (SF-12) is a generic health-related quality of life
sures of change to alternative criterions for change, including a measure with items derived from the 36-Item Short-Form
prospectively administered global function measure. Health Survey (SF-36).37,38 It has been shown to reproduce the
physical (PCS) and mental component summary (MCS) scales
METHODS of the SF-36 with less respondent burden.38
Global disability rating. A prospectively administered global
Participants disability rating (GDR) scale was used to provide the patient’s
Patients with a physician’s diagnosis of a musculoskeletal current global functional assessment without requiring exten-
upper-extremity problem were recruited at the initial visit to sive recall.22 Patients were instructed to answer the question,
local physical and occupational (hand therapy) outpatient clin- “How much has this problem affected your normal daily ac-
ics. All patients 18 years and older who could read and under- tivities using the arm(s) during the past week?” Disability
stand English were eligible for the study. Subjects were ex- levels ranged from 1 (no disability) to 7 (maximum disability).
cluded if they had primary or coexisting systemic conditions Written descriptors defined each level of disability. Patients
including multiple sclerosis, cancer, rheumatoid arthritis, completed this single global question at each outcome assess-
stroke, or mental or cognitive impairment. The recruitment ment. The examining therapists also rated the patient’s baseline
goal was at least 200 subjects to minimize sampling error and disability using this scale.
allow sufficient power for subgroup analysis. This study was The clinician interrater reliability was examined using
approved by local institutional review boards and informed records of therapy evaluations obtained for 10 patients with a
consent was obtained from each research subject. variety of disorders affecting the shoulder, elbow, wrist, and/or
hand. Each participating therapist was asked to independently
Study Design rate the upper-extremity disability on each case. A subset of
therapists was asked to repeat this process approximately 5
A single cohort of patients with upper-extremity diagnoses months later to examine intrarater reliability. Therapists were
was followed-up from the initial clinical visit for a total of 3 blind to their earlier ratings of these identical cases.
months. The duration and method of therapy provided was left The test-retest reliability of the GDR in this population has
to the individual therapist. The 3-month follow-up was com- been previously reported with ICCs of .75 for patient ratings
pleted by mail. The interval between baseline and 3-month and .85 for therapist ratings.22 Significant correlations between
follow-up was used as a construct for change. the patient and therapist ratings supported the construct validity
of this instrument.22 Patient ratings showed Spearman correla-
Outcome Measures tions ranging from .56 to .71 with the 3 upper-extremity out-
Disabilities of the Arm, Shoulder, and Hand Question- come measures (DASH, SPADI, PRWE) at baseline and 3
naire. The Disabilities of the Arm, Shoulder, and Hand ques- months.22
tionnaire (DASH)24 has 30 items and is intended to evaluative Retrospective GRC. This global rating format has been
change over time.25 The DASH is region-specific and so allows used previously to provide an external standard for change in
comparisons across diagnoses of the upper extremity. Scores functional status.5 The patient judges whether change has oc-
may range from 0 to 100, with 0 reflecting no disability. The curred and rates both the extent and the importance of this
DASH has shown high internal consistency25 using the Cron- change on 7-point Likert scales. Degree and importance ratings
bach ␣ and intraclass correlation coefficients (ICCs) for test- of change are summed for a range of 1 to 14 for improvement,
retest reliability using have been reported as .92 and above.26-28 0 for no change, or ⫺1 to ⫺14 for deterioration.
The construct validity and responsiveness of the DASH have Global satisfaction. At the 3-month follow-up, patients
been supported in a number of patient populations.26-30 were asked, “How satisfied were you with your physical/
Shoulder Pain and Disability Index. The Shoulder Pain occupational therapy treatment(s)?” The 5-item responses
and Disability Index (SPADI)31,32 is a 13-item questionnaire ranged from “very dissatisfied” to “very satisfied” with “neu-
related to pain and function of the shoulder. The SPADI tral” as the middle descriptor.
includes a 5-item pain scale and an 8-item disability scale; each
scale is scored as a percentage of the maximum possible scale Protocol
score. The total SPADI is calculated as the mean of the pain At the initial therapy visit, each patient completed a demo-
and disability scales. Scores range from 0 to 100 with higher graphic summary form, the DASH, SF-12, SPADI, and GDR.
scores indicating greater disability. The SPADI has shown In addition, patients with injuries affecting the distal extremity
reliability,33 validity,31,32,34 and responsiveness33,35 among pa- (elbow, wrist, and/or hand) completed the PRWE. The exam-
tients with various shoulder diagnoses. The SPADI was given ining therapist completed a rating of disability based on his/her
to the subgroup of patients with diagnoses affecting the prox- evaluation of the patient’s injury.
imal upper extremity. Three months after the initial visit, the complete packet of
Patient-Rated Wrist Evaluation. The Patient-Rated Wrist questionnaires was mailed to each subject with the addition of
Evaluation (PRWE)30,36 is a joint-specific instrument that con- the retrospective GRC. If there was no response, follow-up
tains 5 items regarding pain and 10 items pertaining to function postcards and a second set of questionnaires was mailed in
of the wrist and hand.30 Item totals are summed and trans- subsequent weeks.

Arch Phys Med Rehabil Vol 86, December 2005


2272 VALIDITY OF CLINICAL CHANGE MEASURES, Schmitt

Table 1: Demographic Summary (model 2,1) for interrater reliability. Test-retest reliability data
3-Month were entered into a 1-way ANOVA with subjects as the inde-
Baseline Follow-up pendent variable. We used mean square terms to calculate an
Characteristics (N⫽211) (n⫽155) ICC1,1 for intrarater reliability.
Mean age ⫾ SD (y) 47.5⫾14.1 49.6⫾14.3 To examine the validity of the GDR, the relation between
Age range (y) 18–88 18–88 baseline therapist and patient ratings of upper-extremity func-
Sex, n (%) tion were analyzed using the Spearman rank-order correlation
Male 105 (49.8) 70 (45.2) coefficient. We used Spearman rank-order coefficients and
Female 106 (50.2) 85 (54.8) 95% confidence intervals (CIs)39 to correlate the GDR, retro-
Symptom location, n (%) spective GRC, and patient satisfaction with the 3 upper-ex-
Shoulder 133 (63.0) 98 (63.2) tremity outcome measures (DASH, SPADI, PRWE) for base-
Elbow, wrist, or hand 69 (32.7) 52 (33.5) line, follow-up, and change scores.
Both 9 (4.3) 5 (3.2)
Symptom duration (mo) RESULTS
Mean ⫾ SD 11.1⫾23 12.7⫾26 A summary of the subjects’ demographic information is
Median 4 4 listed in table 1. The patients had a variety of diagnoses
Workers’ compensation, n (%) 38 (18) 19 (12) involving the upper extremity as is seen in many outpatient
Surgery, n (%) 33 (15.6) 24 (15.5) orthopedic practices. The most common diagnoses were shoul-
Occupation, n (%) der pain (n⫽45 patients, 21% of the sample), shoulder tendon-
Manual labor 67 (31.8) 35 (22.6) itis (n⫽33, 16%), shoulder impingement (n⫽16, 8%), and
Office/professional 101 (47.9) 80 (51.6) elbow tendonitis (n⫽24, 11%).
Retired 29 (13.7) 27 (17.4) Six percent of the SF-12 questionnaires had missing items
Other 14 (6.8) 13 (8.4) (typically 1 item was omitted). The correlation of the imputed
scores with scores using median response replacement was .98
NOTE. Descriptive statistics for eligible subjects enrolled at baseline for both the SF-12 PCS and MCS scores.
and the subset of subjects completing the 3-month follow-up.
Abbreviation: SD, standard deviation.
Reliability and Validity of GDRs
Thirty-nine physical and occupational therapists participated
A total of 326 subjects were invited to participate in this in subject recruitment and therapist GDRs. The ICC2,1 for
study. Of these, 115 subjects declined to participate (50% interrater reliability for case study ratings by the 39 participat-
women; mean age, 49.1y), with lack of time being the primary ing therapists was .67. Five months later, 9 of the participating
reason for refusal. A total of 211 eligible subjects were enrolled therapists rated the same set of 10 case studies with an in-
and 155 (73.5%) returned the 3-month follow-up packet of trarater ICC1,1 of .96. At baseline, the correlations between
questionnaires (table 1). This subset of patients completing patient and therapist GDRs were .64 for patients with proximal
follow-up was similar to the baseline sample in age, sex, diagnoses and .53 for patients with distal diagnoses.
location of symptoms, occupation, education, and race.
Correlations Among Change Measures
Analysis The correlations of both retrospective and prospective crite-
All statistics were run using NCSS 2001.a Instruments were rion measures of change with change scores of the generic
consistently scored by 1 investigator according to the original SF-12 PCS, the region-specific DASH, and the joint-specific
authors’ instructions. When items were missing from the SPADI and PRWE total scores are listed in table 2. Both the
DASH, SPADI, or PRWE, percentage scores were calculated prospective and retrospective criterion measures showed mod-
based on the reduced total number of items. When responses erate correlations with change scores on the upper-extremity
were missing from the SF-12, the PCS and MCS scores were outcome measures (range, .54 –.60), while correlations with
imputed using a multiple linear regression formula with com- patient satisfaction were somewhat lower (range, .27–.46).
pleted items as predictors of the final scale scores. To test the A breakdown of the correlation of the retrospective and
effect of this procedure, Pearson correlation coefficients were prospective change criterion measures with baseline and
calculated to compare imputed scores with scores resulting 3-month follow-up status is shown in table 3. As predicted, the
from substitution of sample median item responses for the retrospective GRC correlated much more highly with the cur-
missing items. rent status of the patient (Spearman ␳ range, ⫺.42 to ⫺.59)
The clinician GDRs of the case records were analyzed with than at baseline (␳ range, .05–.17). In fact, correlations between
a 2-way repeated-measures analysis of variance (ANOVA) the retrospective ratings and baseline scores were nonsignifi-
with case as a repeated factor and rater as an independent cant for each questionnaire. Similarly, the correlations with
variable. We used mean square terms to calculate the ICC patient satisfaction ranged from ⫺.26 to ⫺.32 at follow-up, but

Table 2: Correlations of External Criterion Measures With Questionnaire Change Scores


DASH Change SF-12 PCS Change SPADI Change PRWE Change
Outcome Measures (95% CI) (n⫽139) (95% CI) (n⫽139) (95% CI) (n⫽91) (95% CI) (n⫽40)

Prospective change* .67 (.57–.75) .54 (.41–.65) .63 (.49–.74) .61 (.37–.77)
Retrospective change .66 (.55–.74) .57 (.45–.67) .62 (.48–.73) .62 (.38–.78)
Patient satisfaction .43 (.28–.56) .27 (.11–.42) .32 (.12–.49) .46 (.17–.68)

*Based on change scores on the GDR.

Arch Phys Med Rehabil Vol 86, December 2005


VALIDITY OF CLINICAL CHANGE MEASURES, Schmitt 2273

were nonsignificant at baseline. The correlation with prospec-

⫺.34 (⫺.03, ⫺.59)


⫺.52 (⫺.25, ⫺.72)
tive GDR change scores, however, were opposite in sign and

⫺.31 (.00,⫺.57)
nearly equal in magnitude, as predicted for an unbiased mea-

3-Month
sure of change. Baseline correlations ranged from .26 to .42,

PRWE Scores (n⫽40)


and follow-up correlations were from ⫺.20 to ⫺.40. Despite

Table 3: Spearman Correlations of Global Change Criterion With Baseline, 3-Month Follow-Up, and Change Scores of Upper-Extremity Outcome Measures
this finding, table 2 shows that the retrospective and prospec-
tive global change criterion measures had similar correlations
with the change scores of the DASH, SPADI, and PRWE.

.26 (⫺.06, .53)


.10 (⫺.22, .40)
.04 (⫺.28, .35)
Initial
DISCUSSION
Retrospective global change instruments do not accurately
measure change over time. The retrospective measures used in
this study were heavily influenced by current (posttreatment)
status. As predicted, baseline (pretreatment) status had little or
no impact on the retrospective ratings. Van Stel et al40 reported
⫺.40 (⫺.21, ⫺.56)
⫺.59 (⫺.44, ⫺.71)
⫺.26 (⫺.06, ⫺.44)

similar results in a validation study for the Quality of Life for


3-Month

Respiratory Illness Questionnaire. These findings have impli-


cations for researchers who want to categorize patients accord-
SPADI Scores (n⫽91)

ing to change status. Perhaps more important, it should concern


clinicians who base treatment decisions on a patient’s retro-
spective judgment.41 Transitional questions such as, “How
much have you changed since your first visit” yield responses
.05 (⫺.16, .25)
.09 (⫺.12, .29)

that are poor reflections of true change across time. Accurate


.30 (.10, .48)

assessment of change in health status would require consider-


Initial

ation of baseline health relative to current health, and both


assessments should contribute equally to a valid rating of
change.
Advocates of evidence-based practice have placed more
⫺.20 (⫺.04, ⫺.36)
⫺.42 (⫺.27, ⫺.55)
⫺.26 (⫺.10, ⫺.41)

Sign of correlation coefficients were reversed for consistency with other outcome measures in this table.

emphasis on dichotomous measures of treatment effect such as


experimental and control event rates and number needed to
3-Month
SF-12 PCS Scores (n⫽139)†

treat. Frequently, retrospective global measures have been used


to categorize patients as improved or not.20,21 Despite the
questions raised by Norman et al,13 researchers continue to
assume validity in the use of such outcome measures. These
assumptions do not appear to be warranted.
Guyatt et al23 cite data from 3 different clinical trials to argue
.01 (⫺.16, .18)
.37 (.22, .50)
.17 (.00, .33)

for the use of transition measures as valid retrospective crite-


Initial

rions for change, useful guides to score interpretation, and even


as primary outcome measures in both research and clinical
practice. This argument was based on a regression analysis
where the baseline scores were significant predictors of the
retrospective global change scores. However, these authors
⫺.30 (⫺.14, ⫺.44)
⫺.54 (⫺.41, ⫺.65)
⫺.32 (⫺.12, ⫺.49)

acknowledge that rarely did the transition measures perform


as predicted for a true change criterion, with follow-up
3-Month

scores accounting for a much greater variance in the transi-


DASH Scores (n⫽139)

tion scores in approximately two thirds of the measures cited.


This study involved a 4-week recall period. For longer interim
periods, transition ratings would be even more suspect.
Retrospective ratings are prone to bias. Factors such as
.16 (⫺.01, .32)
.10 (⫺.07, .26)

social desirability or transient variations in health produce


.42 (.27, .55)

undue influence on a rating that is intended to reflect actual


Initial

change across weeks or months of time. The patient making a


*Based on change scores on the GDR.

retrospective assessment of global change appears to be greatly


affected by perceived health status on the day the instrument is
completed. Ross14 asserts that memory may be selective, and
Retrospective global change

events consistent with our current state may be recalled pref-


Prospective global change*

erentially. Patients may not be able to accurately recall baseline


status, or they are unable to make judgments regarding differ-
Change Criterion

ences in disability or health between 2 points in time. Yet


Patient satisfaction

retrospective GRC have frequently been used as the criterion


standard for validating change in health-assessment question-
naires. Retrospective ratings of change appear to be inadequate
for this purpose.
Previous research has highlighted the difficulty of retrospec-
tive recall of health status. Mancuso and Charlson42 have
shown that patients do not accurately recall baseline state, and

Arch Phys Med Rehabil Vol 86, December 2005


2274 VALIDITY OF CLINICAL CHANGE MEASURES, Schmitt

that the direction of error (ie, under- or overestimation) is often not There is agreement in the literature that there is no criterion
predictable. Overall, however, patients tended to overestimate standard for measuring function or functional change. Given
treatment effects following hip replacement. Aseltine et al41 con- the challenges inherent in determining a valid global criterion
firmed this tendency among patients with prostate and gyne- for change, it seems preferable to avoid the use of one alto-
cologic conditions, especially following surgical interventions. gether. A priori prognostic ratings have been used to identify
Ross14 also reviewed several studies that revealed a tendency the subgroup of patients most likely to change in response to
for subjects to recall greater change than actually occurred. intervention.54,55 Although a priori ratings may not produce the
Various investigators have reported that retrospective GRC bias inherent in retrospective measures, they are still likely to
were more responsive than a number of functional outcome result in a high degree of misclassification. Patient satisfaction
measures,43-45 but this may simply reflect overestimation of has been used as a criterion for change, but this approach has
drawbacks similar to the use of retrospective global ratings.
change due to recall or experimenter bias.
Satisfaction levels reflect many aspects of care in addition to
Herrmann46 has summarized arguments against relying on functional change. In this study, satisfaction also reflected the
the patient’s recall ability with retrospective global measures, current state of the patient rather than true change over time.
including the lack of specificity of the question, the common Previous research as shown a high correlation between satis-
occurrence of “guessing” when recall is deficient, and recall faction and retrospective global change ratings, but a low
bias due to intervening treatment. Inability to recall baseline correlation between satisfaction and a serial (prospectively
status calls into question the validity of retrospective judgments administered) assessment of global change, suggesting that
of change. satisfaction is also biased toward posttreatment status rather
Other researchers have reported different results from than true change over time.44 There is a need for an acceptable
prospective versus retrospective global change instru- alternative to the use of global change criterion measures.
ments.26,41,44,47,48 Different estimations of change yield differ- We have recently described an approach to determine
ent samples for calculation of responsiveness indices. It would change status based on pre-post treatment scores on the stan-
be useful to identify a preferred criterion measure of global dardized multi-item functional outcome measures used in this
change. As an alternative to GRC, previous investigators have study.22 Each subject’s change score is compared with an
used patient satisfaction,7,49-52 retrospective rating of change in established threshold to determine whether true or meaningful
overall pain or function,6 and estimates of the degree of treat- change has occurred. The threshold for change may be defined
ment effectiveness43 or recovery from surgery.53 In each case, according to statistically important (minimum detectable
recall to an earlier state is required to make retrospective change) or clinically important change (minimal clinically im-
judgments about change over time. portant difference). These approaches do not require the use of
In contrast to retrospective ratings, change scores from a a global change criterion and may serve as a valid alternative
prospectively administered global function item correlated for determining true change status.
equally and in opposite direction to baseline and follow-up
functional status measured by the DASH, SPADI, and PRWE. CONCLUSIONS
That is, both baseline and follow-up status contributed equally Criteria for global change are used frequently in research and
to the prospective change criterion, as would be expected for a clinical practice. Retrospective global change and satisfaction
true measure of change. Clearly, prospectively measured global questions appear to reflect follow-up status rather than true
change would be less prone to bias by recall or therapist change over time. Change scores from a prospectively mea-
influence. The single-item GDR used in this study was shown sured global functional status question reflected both baseline
to be reliable and showed validity concurrent with clinician and follow-up status fairly equally, but still accounted for a
global ratings. However, both prospective and retrospective small to moderate amount of the variance in commonly ac-
global change measures demonstrated similar modest correla- cepted functional outcome measures for the upper extremity.
tions (␳ range, .59 –.67) with change scores on these same This study suggests that global change measures are not valid
upper-extremity questionnaires. Individually, the coefficient of indicators of true change. We recommend the development of
determination calculated from correlations of either of the alternatives to global change instruments in studies of respon-
change criterions with the DASH, SPADI, and PRWE ranged siveness.
from 9% to 45%. If the global change instrument is chosen as
the criterion standard, then the DASH and other accepted Acknowledgments: We thank the physical and occupational ther-
functional questionnaires leave a large proportion of variance apists at Park Nicollet Clinic and the Institute for Athletic Medicine for
unexplained. their support in recruiting patients.
Overall, these results highlight the inherent difficulty of References
using a single-item global rating as a criterion for change. As 1. Beurskens AJ, de Vet HC, Koke AJ, van der Heijden GJ,
Norman et al comment, “Indeed, if the single global rating Knipschild PG. Measuring the functional status of patients with
could be shown to have superior measurement characteristics, low back pain. Assessment of the quality of four disease-specific
there is no reason to not simply use this as a measure of questionnaires. Spine 1995;20:1017-28.
health-related quality of life.”13(p871-2) The assumption in using 2. Guyatt G, Walter S, Norman G. Measuring change over time:
the global change rating as a criterion is that the patient’s assessing the usefulness of evaluative instruments. J Chronic Dis
judgment will account for the varying symptom levels and 1987;40:171-8.
disabilities for varying activities over an extended period of 3. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsive-
time and create an accurate “overall” rating of health and ness of health status measures. Statistics and strategies for eval-
disability. By contrast, multiple-item health status question- uation. Control Clin Trials 1991;12:142S-58S.
naires are designed to assess functional ability over a range of 4. Kirshner B, Guyatt G. A methodological framework for assessing
daily tasks and activities. Errors in the estimation of difficulty health indices. J Chronic Dis 1985;38:27-36.
with any given activity would not unduly influence the overall 5. Stratford PW, Binkley FM, Riddle DL. Health status measures:
score. Because transitional questions are avoided, error caused strategies and analytic methods for assessing change scores. Phys
by inaccurate recall would be minimized. Ther 1996;76:1109-23.

Arch Phys Med Rehabil Vol 86, December 2005


VALIDITY OF CLINICAL CHANGE MEASURES, Schmitt 2275

6. Wright JG, Young NL. A comparison of different indices of 26. Beaton DE, Katz JN, Fossel AH, Wright JG, Tarasuk V,
responsiveness. J Clin Epidemiol 1997;50:239-46. Bombardier C. Measuring the whole or the parts? Validity, reli-
7. Stucki G, Liang MH, Fossel AH, Katz JN. Relative responsive- ability, and responsiveness of the Disabilities of the Arm, Shoul-
ness of condition-specific and generic health status measures in der and Hand outcome measure in different regions of the upper
degenerative lumbar spinal stenosis. J Clin Epidemiol 1995;48: extremity. J Hand Ther 2001;14:128-46.
1369-78. 27. MacDermid JC. Outcome evaluation in patients with elbow pa-
8. Middel B, Stewart R, Bouma J, van Sonderen E, van den Heuvel thology: issues in instrument development and evaluation. J Hand
WJ. How to validate clinically important change in health-related Ther 2001;14:105-14.
functional status. Is the magnitude of the effect size consistently 28. Turchin DC, Beaton DE, Richards RR. Validity of observer-based
related to magnitude of change as indicated by a global question aggregate scoring systems as descriptors of elbow pain, function,
rating? J Eval Clin Pract 2001;7:399-410.
and disability. J Bone Joint Surg Am 1998;80:154-62.
9. Taylor SJ, Taylor AE, Foy MA, Fogg AJ. Responsiveness of
29. Kirkley A, Griffin S, McLintock H, Ng L. The development and
common outcome measures for patients with low back pain. Spine
evaluation of a disease-specific quality of life measurement tool
1999;24:1805-12.
10. Santanello NC, Zhang J, Seidenberg B, Reiss TF, Barber BL. for shoulder instability. The Western Ontario Shoulder Instability
What are minimal important changes for asthma measures in a Index (WOSI). Am J Sports Med 1998;26:764-72.
clinical trial? Eur Respir J 1999;14:23-7. 30. MacDermid JC, Richards RS, Donner A, Bellamy N, Roth JH.
11. Stratford PW, Binkley JM, Riddle DL, Guyatt GH. Sensitivity to Responsiveness of the short form-36, disability of the arm, shoul-
change of the Roland-Morris Back Pain Questionnaire: part 1. der, and hand questionnaire, patient-rated wrist evaluation, and
Phys Ther 1998;78:1186-96. physical impairment measurements in evaluating recovery after a
12. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. distal radius fracture. J Hand Surg [Am] 2000;25:330-40.
Ascertaining the minimal clinically important difference. Control 31. Roach KE, Budiman-Mak E, Songsiridej N, Lertratanakul Y.
Clin Trials 1989;10:407-15. Development of a shoulder pain and disability index. Arthritis
13. Norman GR, Stratford P, Regehr G. Methodological problems in Care Res 1991;4:143-9.
the retrospective computation of responsiveness to change: the 32. Williams JW Jr, Holleman DR Jr, Simel DL. Measuring shoulder
lesson of Cronbach. J Clin Epidemiol 1997;50:869-79. function with the Shoulder Pain and Disability Index. J Rheumatol
14. Ross M. Relation of implicit theories to the construction of per- 1995;22:727-32.
sonal histories. Psychol Rev 1989;96:341-57. 33. Beaton D, Richards RR. Assessing the reliability and responsive-
15. van der Lee JH, Beckerman H, Knol DL, de Vet HC, Bouter LM. ness of 5 shoulder questionnaires. J Shoulder Elbow Surg 1998;
Clinimetric properties of the motor activity log for the assessment 7:565-72.
of arm use in hemiparetic patients. Stroke 2004;35:1410-4. 34. Beaton DE, Richards RR. Measuring function of the shoulder. A
16. Williams NH, Wilkinson C, Russell IT. Extending the Aberdeen cross-sectional comparison of five questionnaires. J Bone Joint
Back Pain Scale to include the whole spine: a set of outcome Surg Am 1996;78:882-90.
measures for the neck, upper and lower back. Pain 2001;94: 35. Heald SL, Riddle DL, Lamb RL. The shoulder pain and disability
261-74. index: the construct validity and responsiveness of a region-
17. Dunkl PR, Taylor AG, McConnell GG, Alfano AP, Conaway MR. specific disability measure. Phys Ther 1997;77:1079-89.
Responsiveness of fibromyalgia clinical trial outcome measures. 36. MacDermid JC, Turgeon T, Richards RS, Beadle M, Roth JH.
J Rheumatol 2000;27:2683-91. Patient rating of wrist pain and disability: a reliable and valid
18. Chang E, Abrahamowicz M, Ferland D, Fortin PR. Comparison of measurement tool. J Orthop Trauma 1998;12:577-86.
the responsiveness of lupus disease activity measures to changes 37. Jenkinson C, Layte R, Jenkinson D, et al. A shorter form health
in systemic lupus erythematosus activity relevant to patients and survey: can the SF-12 replicate results from the SF-36 in longi-
physicians. J Clin Epidemiol 2002;55:488-97. tudinal studies? J Public Health Med 1997;19:179-86.
19. Locker D, Jokovic A, Clarke M. Assessing the responsiveness of 38. Ware J Jr, Kosinski M, Keller SD. A 12-Item Short-Form Health
measures of oral health-related quality of life. Community Dent Survey: construction of scales and preliminary tests of reliability
Oral Epidemiol 2004;32:10-8. and validity. Med Care 1996;34:220-33.
20. Hoving JL, Koes BW, de Vet HC, et al. Manual therapy, physical 39. Luus HG, Muller FO, Meyer BH. Statistical significance versus
therapy, or continued care by a general practitioner for patients clinical relevance. Part III. Methods for calculating confidence
with neck pain. A randomized, controlled trial. Ann Intern Med intervals. S Afr Med J 1989;76:681-5.
2002;136:713-22. 40. van Stel HF, Maille AR, Colland VT, Everaerd W. Interpretation
21. Smidt N, van der Windt DA, Assendelft WJ, Deville WL, of change and longitudinal validity of the quality of life for
Korthals-de Bos IB, Bouter LM. Corticosteroid injections, phys- respiratory illness questionnaire (QoLRIQ) in inpatient pulmonary
iotherapy, or a wait-and-see policy for lateral epicondylitis: a rehabilitation. Qual Life Res 2003;12:133-45.
randomised controlled trial. Lancet 2002;359:657-62. 41. Aseltine RH Jr, Carlson KJ, Fowler FJ Jr, Barry MJ. Comparing
22. Schmitt JS, Di Fabio RP. Reliable change and minimum important prospective and retrospective measures of treatment outcomes.
difference (MID) proportions facilitated group responsiveness Med Care 1995;33:AS67-76.
comparisons using individual threshold criteria. J Clin Epidemiol 42. Mancuso CA, Charlson ME. Does recollection error threaten the
2004;57:1008-18. validity of cross-sectional studies of effectiveness? Med Care
23. Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look 1995;33:AS77-88.
at transition ratings. J Clin Epidemiol 2002;55:900-8. 43. Gotzsche PC. Sensitivity of effect variables in rheumatoid arthri-
24. Hudak PL, Amadio PC, Bombardier C. Development of an upper tis: a meta-analysis of 130 placebo controlled NSAID trials. J Clin
extremity outcome measure: the DASH (disabilities of the arm, Epidemiol 1990;43:1313-8.
shoulder and hand) [corrected]. The Upper Extremity Collabora- 44. Fischer D, Stewart AL, Bloch DA, Lorig K, Laurent D, Holman
tive Group (UECG). Am J Ind Med 1996;29:602-8. H. Capturing the patient’s view of change as a clinical outcome
25. Marx RG, Bombardier C, Hogg-Johnson S, Wright JG. Clinimet- measure. JAMA 1999;282:1157-62.
ric and psychometric strategies for development of a health mea- 45. Ziebland S, Fitzpatrick R, Jenkinson C, Mowat A. Comparison of
surement scale. J Clin Epidemiol 1999;52:105-11. two approaches to measuring change in health status in rheuma-

Arch Phys Med Rehabil Vol 86, December 2005


2276 VALIDITY OF CLINICAL CHANGE MEASURES, Schmitt

toid arthritis: the Health Assessment Questionnaire (HAQ) and 52. Bessette L, Sangha O, Kuntz KM, et al. Comparative responsive-
modified HAQ. Ann Rheum Dis 1992;51:1202-5. ness of generic versus disease-specific and weighted versus un-
46. Herrmann D. Reporting current, past, and changed health status. weighted health status measures in carpal tunnel syndrome. Med
What we know about distortion. Med Care 1995;33:AS89-94. Care 1998;36:491-502.
47. Barber BL, Santanello NC, Epstein RS. Impact of the global on 53. van der Windt DA, van der Heijden GJ, de Winter AF, Koes BW,
patient perceivable change in an asthma specific QOL question- Deville W, Bouter LM. The responsiveness of the Shoulder Dis-
naire. Qual Life Res 1996;5:117-22. ability Questionnaire. Ann Rheum Dis 1998;57:82-7.
48. Timmerman AA, Anteunis LJ, Meesters CM. Response-shift bias 54. Binkley JM, Stratford PW, Lott SA, Riddle DL. The Lower Extrem-
and parent-reported quality of life in children with otitis media. ity Functional Scale (LEFS): scale development, measurement prop-
Arch Otolaryngol Head Neck Surg 2003;129:987-91.
erties, and clinical application. North American Orthopaedic Reha-
49. L’Insalata JC, Warren RF, Cohen SB, Altchek DW, Peterson MG. A
bilitation Research Network. Phys Ther 1999;79:371-83.
self-administered questionnaire for assessment of symptoms and
55. Stratford PW, Binkley JM, Riddle DL. Development and initial
function of the shoulder. J Bone Joint Surg Am 1997;79:738-48.
50. BenDebba M, Heller J, Ducker TB, Eisinger JM. Cervical spine validation of the back pain functional scale. Spine 2000;25:2095-
outcomes questionnaire: its development and psychometric prop- 102.
erties. Spine 2002;27:2116-24.
51. Katz JN, Gelberman RH, Wright EA, Lew RA, Liang MH. Re- Supplier
sponsiveness of self-reported and objective measures of disease a. Number Cruncher Statistical Systems, 329 N 1000 E, Kaysville, UT
severity in carpal tunnel syndrome. Med Care 1994;32:1127-33. 84037.

Arch Phys Med Rehabil Vol 86, December 2005

Anda mungkin juga menyukai