Anda di halaman 1dari 6

Advances in Physiotherapy, 2005; 7: 2 /6

ORIGINAL ARTICLE

Inter- and intra-rater reliability of the B. Lindmark Motor Assessment

CK MARIE KIERKEGAARD & ANNA TOLLBA


Department of Physical Therapy, Karolinska Institute, Karolinska Hospital, Stockholm, Sweden

Abstract B. Lindmark motor assessment (BLMA) is an instrument designed for evaluation of functional capacity after stroke. The objective of this study was to evaluate intra- and inter-rater reliability of the BLMA from assessments of videotaped examinations with the BLMA. Three physiotherapists served as raters and 21 acute stroke patients, with different levels of impairment, served as subjects for the videotapes. The videotaped examinations were viewed and scored by the raters individually on two occasions. Kappa statistics were used for calculation of reliability. For intra-rater reliability, the kappa values for the paretic side ranged from 0.43 to 1 and from 0.66 to 0.84 for mobility and balance. For inter-rater reliability, the kappa values for the paretic side ranged from 0.46 to 1 and from 0.45 to 0.86 for mobility and balance. It can be concluded that the BLMA has a good intra- and inter-rater reliability in an acute stroke care setting.

Key Words: Cerebrovascular disorders, functional assessment, physical therapy, reliability

Introduction Immediate assessment of patients to identify those who may benefit from stroke rehabilitation is regarded as of uppermost importance in acute stroke care [1,2]. Reliable and valid outcome measures are needed in order to discriminate among subjects, to predict future states, and to evaluate patient outcomes or the effectiveness of an intervention [3]. B. Lindmark motor assessment (BLMA) [4] is a modification of the assessment method of FuglMeyer et al. [5]. The domains investigated by BLMA are: the ability to perform active selective movements (part A, 31 items), rapid movement changes, i.e. co-ordination (part B, four items), mobility including walking (part C, eight items), balance (part D, seven items), sensation (part E, 13 items), joint pain (part F, nine items), and passive range of joint motion (part G, 26 items). Most of the items are scored on a four-point scale from 0 (no function/cannot perform the activity) to 3 (normal function/can perform the activity without help). A seven-point scale from 0 (cannot) to 6 (normal) is used to assess walking ability and a three-point scale to assess sensation, joint pain and joint motion. Both the paretic side and the non-paretic side are

evaluated and an examination takes approximately 30 min to complete. Based on motor scores, i.e. the total scores of part A and B together, from 231 acute stroke patients, Lindmark has described four functional groups according to the level of impairment [6]. The BLMA is mainly an impairment measure but has one dimension that evaluates disability. Construct, concurrent and predictive validity of the BLMA have been reported to be satisfactory [6]. Inter-rater reliability, assessed by having two observers evaluate 12 patients (1 year post-stroke onset) independently at the same time showed a level of agreement for the whole instrument of 96% [7]. Test of intra-rater reliability performed by having the same person assess 28 stroke patients (at least 6 months post-stroke onset) on two occasions, 3 weeks apart, showed a level of agreement for the whole instrument of 98% [7]. The correlation coefficients of the different parts between the assessment occasions were reported to be high, in most cases above 0.95 [7]. However, percentage agreement methods do not take chance agreement into account and their use as an index of agreement among raters has been discouraged [8,9]. Furthermore, the correlation coefficient is a measure of linear association between two variables and not a measure of agreement. The

Correspondence: Marie Kierkegaard, Department of Physical Therapy, R1:07 Karolinska Hospital, SE-171 76 Stockholm, Sweden. Tel:  /46 8517 72022. Fax:  /46 8517 74967. E-mail: marie.kierkegaard@karolinska.se

(Received 8 January 2004; accepted 7 September 2004)


ISSN 1403-8196 print/ISSN 1651-1948 online # 2005 Taylor & Francis DOI: 10.1080/14038190510008776

B. Lindmark Motor Assessment calculation of correlation coefficients has been considered an inappropriate approach to judge agreement [10]. An approach to assess reliability is to examine observer agreement, and kappa is a reliability coefficient designed for use with nominal or ordinal data [11]. When reviewing the literature, no study was found that determined the reliability of measurements obtained on acute stroke patients with the BLMA. Thus, further reliability testing of the instrument would be of great value in order to establish the intra- and inter-rater reliability for physiotherapists with different experience of assessing patients suffering from stroke. The purpose of this study was to examine the inter- and intra-rater reliability of the parts A to E of the BLMA in an acute stroke care setting. Ethics

An application was sent to the Committee of Ethics at the Karolinska Hospital prior to the study. The Committee of Ethics considered the study a quality assurance project and their approval was not required. Equipment and procedure Initial screening among the patients to determine who met the inclusion criteria was conducted by one of the authors (MK) prior to the study. The patients were assessed with the BLMA and the motor score was calculated in order to make sure that the patients as a group represented all functional groups described by B. Lindmark. One BLMA examination of each individual patient was administered by one of the authors (MK) and recorded on videotape. The video camera was secured on a tripod and was placed in such way that the best possible view of the patient for each test situation was recorded. All videotaping was carried out by the same individual. The patients were videotaped on days 1 /9, with a mean of 3 days, after admission to the stroke ward. The admission day was for all but one patient the same day as stroke onset. The videotapes were used for the intra- and inter-rater reliability study. Prior to commencing the study, the raters were orientated to the evaluation form and the scoring protocol. The raters were then trained on three occasions, approximately 3 h all together, in assessing videotaped examinations with the BLMA. For these training sessions, videotapes of three stroke patients not included in the present study were used. The videotaped examinations were viewed and scored by the raters individually. The scoring sheet was collected immediately after a rater had completed an assessment. The raters were instructed to evaluate the videotapes in a prescribed order, starting with patient 1 and ending with patient 21. The raters were asked to perform a second evaluation of the videotaped examinations at least 4 weeks after the completion of the first evaluation. Since viewing and scoring the videotapes was rather time consuming, the raters were allowed to assess the tapes during a 6-month period. No discussion regarding the assessments was allowed among the raters during this period. All but one rater followed the prescribed rating order of the individual patient films. She did, however, rate the videotaped examinations in the same order at the two different occasions in time. Data from the first and second evaluation were used for the examination of intra-rater reliability and data from the second evaluation were used for the examination of inter-rater reliability.

Materials and methods Raters Three physiotherapists working at the neurological department at the Karolinska Hospital took part in the study. The raters were chosen in order to represent different levels of skill in neurological rehabilitation and in evaluating stroke patients. They had been registered physiotherapists for 4 /20 years and had varying experience in neurological rehabilitation, ranging from 2 to 16 years. Two of the raters had previous experience of the BLMA. Each rater had been informed about the purpose of the study and was fully advised about the procedures to be used. Patients A sample of 21 patients (median age 76 years, range 56 /88 years; 11 men, 10 women; paretic side nine right, 12 left) admitted to the acute stroke ward at the Karolinska Hospital in Stockholm, Sweden, participated in the study. The patients were selected so that the sample consisted of approximately five patients in each functional group according to level of impairment as described by B. Lindmark [6]. Criteria for inclusion in the study were: that the patient had a diagnosed brain infarction or intracerebral haemorrhage verified by computerized tomography scan, that the patient was rated to be at grade 1 or 2 according to the Reaction Level Scale (RLS 85) [12], i.e. fully alert or drowsy, and that the patient could give their consent to participate in the study. Each patient received verbal and written information and signed a document of informed consent before entering the study.

M. Kierkegaard & A. Tollba ck


Table II. Intra-rater reliability results; kappa values for each rater and function of the non-paretic side. Kappa coefficient Function Part B: Co-ordination (4 items) Rater 1 0.46 Rater 2 0.1 Rater 3 0.61

Statistics Kappa statistics [10] were used for calculation of both inter- and intra-rater reliability. The kappa statistic is a coefficient of agreement for categorical data that corrects for agreement expected by chance. When agreement is perfect, the kappa coefficient has a value of 1.00 and a value of zero indicates no agreement better than chance. Altman [10] has proposed the following guidelines for classification of strength of agreement for kappa values: B/0.20 /poor, 0.21 /0.40 /fair, 0.41 /0.60 /moderate, 0.61 /0.80 /good and 0.81 /1.00 /very good. Since the BLMA is an ordinal scale with ordered categorical data, where the categorical labels do not represent any mathematical value but only an order, arithmetic cannot be applied [13]. The scores from individual items should therefore not be added to a total score. The scale was therefore operationalized so that the median for a function, corresponding to some of the sub-scales and total scores described by B. Lindmark [4], was calculated and used for data analysis. Results The intra-rater reliability results are shown in Table I. The kappa values for the paretic side ranged from 0.43 to 1. The level of agreement was very good for 13, good for 10 and moderate for four of the coefficients. Kappa values for functions of the non-paretic side could not be calculated due to uniformity of scores except for part B (Table II). The inter-rater reliability results are shown in Table III. The kappa values for the paretic side ranged from 0.45 to 1. The level of agreement was very good for five, good for 15 and moderate for seven of the coefficients. Kappa values for functions of the non-paretic side could not be calculated due to uniformity of scores except for lower extremity of part A and part B (Table IV).
Table I. Intra-rater reliability results; kappa values for each rater and function of the paretic side. Kappa coefficient Function Part Part Part Part Part Part Part Part Part A: Arm (8 items) A: Wrist (3 items) A: Hand function (8 items) A: Lower extremity (12 items) B: Co-ordination (4 items) C: Mobility (8 items) D: Balance (7 items) E: Light touch (4 items) E: Joint position (9 items) Rater 1 Rater 2 Rater 3 0.43 0.54 1 0.85 0.54 0.74 0.68 0.73 0.7 0.66 0.82 0.82 0.87 0.86 0.84 0.66 1 1 0.86 0.78 0.86 0.79 0.92 0.68 0.73 1 0.5

Three patients were excluded from the evaluation of joint position due to aphasia or perceptual problems, which made assessment not applicable. Discussion Studies of reliability of existing clinical measures are of great importance. Although outcome measures have been published, including information on reliability, corroborative studies of reliability are rare. It is not self-evident that reliability will be maintained when a measure is used by different people under different circumstances. Different factors that influence the reliability of a measurement are the sources of variability studied, the subjects selected and the range of scores exhibited [14]. Differences found in repeated measurements of the same characteristic can be attributed to instrument, intra-rater, inter-rater and intra-subject components. The number of sources of variability within a reliability component is controlled by the degree of standardization. A highly standardized approach seeks to document reliability in an ideal situation and as many as possible of the thinkable sources of variability are controlled. Inter- and intra-rater reliability were the components of focus in this study. In order to standardize the test situation and minimize the within-patient variation, one of the authors (MK) administered all examinations, which were videotaped for later scoring according to the BLMA. The reasons for this were that the acute stroke patient cannot be considered functionally
Table III. Inter-rater reliability results; kappa values for each rater pair and function of the paretic side. Kappa coefficient Rater 1 /2 0.46 0.61 0.79 0.65 0.86 0.76 0.58 1 0.5 Rater 1 /3 0.49 0.77 0.71 0.58 0.79 0.75 0.45 0.81 0.73 Rater 2 /3 0.74 0.74 0.8 0.79 0.78 0.68 0.86 0.82 0.56

Function Part Part Part Part Part Part Part Part Part A: Arm (8 items) A: Wrist (3 items) A: Hand function (8 items) A: Lower extremity (12 items) B: Co-ordination (4 items) C: Mobility (8 items) D: Balance (7 items) E: Light touch (4 items) E: Joint position (9 items)

B. Lindmark Motor Assessment


Table IV. Inter-rater reliability results; kappa values for each rater pair and function of the non-paretic side. Kappa coefficient Rater 1 /2 0 0.27 Rater 1 /3 0.64 0.25 Rater 2 /3 0.41 0.3

Function Part A: Lower extremity (12 items) Part B: Co-ordination (4 items)

stable in the acute phase, and in order to study the intra-rater reliability, this design was considered the only one possible. There are several advantages and disadvantages of using video-recordings to capture observational data. As mentioned, it is possible to eliminate factors as variation of the patients performance and variation over time, but this approach does not take into account other sources of variability that are present when clinicians assess patients in practice. Another design used in inter-rater reliability studies is the use of a golden standard, i.e. each raters evaluation is compared to a master rating that is supposed to be absolutely correct. The assumption that there is one correct rating is not possible to establish and therefore this design is not applicable. The statistics used to describe reliability depend upon the type of reliability and the data level. For nominal and ordinal data, non-parametric tests are recommended. The kappa statistic has been recommended as the best approach for measuring agreement with this type of data [10]. A shortcoming with the kappa statistic is that the value of kappa depends upon the proportion of subjects in each category, the variability among subjects and the number of categories. Meaningful values of kappa can therefore only be derived when there is sufficient variability in scores. In this study, there was a lack of variability of scores for items of the non-paretic side, which influenced the value of kappa. Even though the patients were selected to represent different levels of motor function, it is possible that the variability among subjects was insufficient. Streiner & Norman state, Reliability cannot be conceived of as a property which a particular instrument does or does not possess: rather any measure will have a certain degree of reliability when applied to certain population under certain conditions [11]. What is considered an acceptable reliability value depends upon the circumstances. In this study 75% of the kappa values that were calculated for intra-rater reliability reached a good or very good level of agreement according to the criteria described by Altman [10]. Of the kappa coefficients,

obtained for the different parts of the BLMA concerning inter-rater reliability, 67% were within good or very good level of agreement. Only three of 36 kappa coefficients reached a fair level of agreement. As expected, there was a slightly higher level of intra-rater agreement compared with the level of inter-rater agreement. There was, however, a difference in kappa values among the raters, as seen in Tables I and II. Kappa values for rater 1 tended to be lower than for rater 2 and 3. This could be due to that rater 1, although skilled in neurological rehabilitation, had no previous experience in evaluating stroke patients with the BLMA. Since the raters evaluated exactly the same videotaped examinations, rater 1 either had difficulties in scoring due to the fact that it was a videotaped and not a live examination, or it could be due to the understanding and use of the manual and scoring protocol of the BLMA. The domain co-ordination, especially in the non-paretic side, seemed to be more difficult for all the raters to score. This may be due to the complexity of the scoring protocol and the lack of consensus of what is considered normal speed for rapid movements changes. The BLMA was developed as a part of an investigation initiated in 1984 at the University Hospital in Uppsala, with the aim of assessing and following the functional capacity of stroke patients [15,16]. The assessment has since then been used both clinically and in research [17]. An advantage with the BLMA is that it focuses on the total motor capacity, not only the paretic side. This makes the assessment useful for obtaining a complete view of the stroke patients motor impairments. However, a disadvantage with the BLMA is the extensiveness and the relatively complexity of the measurement, and it has been proposed that there is a high degree of redundancy and over-sampling [18]. From this study, it can be concluded that the BLMA has a good intra- and inter-rater reliability when used in an acute stroke care setting.

Acknowledgements This study was supported by funds from Karolinska Institute (Na rdvetenskap, and mnden fo r va Forskningsna rd) and by grants from the mnd Va Swedish Stroke Association. Special thanks to Kristina Norgren who videotaped all examinations and to Margareta Jonsson, Susanne Littorin and Louise Martinsson who served as participating raters.

M. Kierkegaard & A. Tollba ck


[9] Ottenbacher KJ, Tomchek SD. Reliability analysis in therapeutic research: Practice and procedures. Am J Occup Ther 1993;47:10 /6. [10] Altman DG. Practical statistics for medical research. London: Chapman & Hall; 1991. [11] Streiner DL, Norman GR. Health Measurement Scales: A practical guide to their development and use. New York: Oxford University Press Inc; 1989. [12] Starmark J-E, Sta lhammar D, Holmgren E. The Reaction Level Scale (RLS 85) Manual and guidelines. Acta Neurochir 1988;91:12 /20. [13] Merbitz C, Morris J, Grip JC. Ordinal scales and foundations of misinference. Arch Phys Med Rehabil 1989;70:308 / 12. [14] Domholdt E. Physical therapy research: Principles and applications. Philadelphia: W. B. Saunders Company; 1993. [15] Lindmark B. The improvement of different motor functions after stroke. Clin Rehabil 1988;2:275 /83. [16] Lindmark B. A ve-year follow-up of stroke survivors: Motor function and activities of daily living. Clin Rehabil 1995;9: 1 /9. [17] Wide n Holmqvist L. Development and evaluation of rehabilitation at home after stroke in south-west Stockholm: Karolinska Institute, Stockholm, Sweden; 1997. [18] Lyden PD, Lau GT. A critical appraisal of stroke evaluation and rating scales. Stroke 1991;22:1345 /52.

References
[1] Socialstyrelsen. Nationella riktlinjer fo rd. r strokesjukva Stockholm: Modin Tryck; 2000. [2] Slaganfall. The Swedish Council on Technology Assessment in Health Care. ISBN 91-87890-18-16 1992. [3] Lennon S. Using standardised scales to document outcome in stroke rehabilitation. Physiotherapy 1995;81:200 /2. [4] Lindmark B, Hamrin E. Evaluation of functional capacity after stroke as basis for active intervention Presentation of a modied chart for motor capacity assessment and its reliability. Scand J Rehab Med 1988;20:103 /9. [5] Fugl-Meyer AR, Ja a sko L, Leyman I, Olsson S, Steglind S. The post-stroke hemiplegic patient. I. A method for evaluation of physical performance. Scand J Rehab Med 1975; 7:13 /31. [6] Lindmark B, Hamrin E. Evaluation of functional capacity after stroke as basis for active intervention Validation of a modied chart for motor capacity assessment. Scand J Rehab Med 1988;20:111 /5. [7] Lindmark B. Evaluation of functional capacity after stroke with special emphasis on motor function and activities of daily living. Scand J Rehab Med 1988;Suppl 21:4 /40. [8] Bartko JJ, Carpenter WT. On the methods and theory of reliability. J Nerv Ment Dis 1976;163:307 /17.

Minat Terkait