Anda di halaman 1dari 5

Medical Teacher, Vol. 27, No. 6, 2005, pp.

534538

A short questionnaire to evaluate the effectiveness


of tutors in PBL: validity and reliability

DIANA H.J.M. DOLMANS1 & PAUL GINNS2


1
University of Maastricht, The Netherlands; 2University of Sydney, Australia

SUMMARY The tutor plays a central role in problem-based courses, they become tired of filling out these question-
learning (PBL). In many PBL-curricula the effectiveness of naires as they are quite long. Using a short questionnaire is
the tutor is evaluated in order to provide tutors with feedback. more convenient for the students. The question is, however,
In the literature, several tutor evaluation instruments have been whether a short questionnaire can be valid and reliable.
described. The problem with many of these instruments is that they The questionnaire developed earlier by Dolmans et al.
are quite long, due to which students become tired of filling them was based on theoretical notions underlying contemporary
out. Using a short questionnaire is more convenient for students, constructivist approaches to learning and instruction
but the question is whether such a short instrument can be valid on which problem-based learning is based. The common
and reliable. The purpose of this article is to demonstrate the principles utilized by constructivists include active or con-
validity and reliability of a short questionnaire (11 items), repre- structive learning, self-directed learning, contextual learn-
senting five underlying factors. A confirmatory factor analysis was ing and collaborative learning. In addition, modern theories
carried out to assess the adequacy of the five factors. The results on teaching stress that the teachers intra-personal behav-
demonstrated that the five factor model had a good fit to the iour is important. The instrument developed was based on
data. The alpha coefficients also demonstrated acceptable levels. these insights and included items on the five main topics
In conclusion, the short tutor evaluation instrument (11 items) is mentioned: active/constructive, self-directed, contextual
reliable and valid and can be used for formative and summative and collaborative learning and intra-personal behaviour.
purposes. The initially developed and validated questionnaire con-
tained about three-to-five items for each factor. Students
complaints about the length of the questionnaire prompted
the current investigation, to determine if the number of items
Introduction
per scale might be reduced to two or three items while
The tutor plays an important role in the problem-based retaining acceptable levels of reliability and validity. The first
curriculum. Tutor performance has a direct influence on author of this paper and another researcher together decided
tutorial group functioning (van Berkel & Schmidt, 2000). which items within each factor at best represented the
Because the tutor plays a central role, it is necessary to provide underlying factor; these items were included in the ques-
the tutor with feedback about his/her performance that can tionnaire. Both persons who decided which items would be
be used to carry out improvements. Therefore, instruments included were experts on PBL and tutoring and were con-
need to be available to evaluate the performance of the tutor. versant in research on effective tutoring. Furthermore, these
The data collected can also be used for summative purposes. two persons had extensive experience with tutor evaluation
Recurring poor tutor performances should be discussed questionnaires.
during annual review sessions with the department chair and
could for example have negative consequences for promotion
decisions. However, if these data are used for accountability Method
purposes, they should be reliable and valid (Marsh, 1987). Setting
Several tutor evaluation instruments have been described
in the research literature. For example, De Grave et al. (1998) The study was conducted in the problem-based curriculum
and Dolmans et al. (2003) both developed and validated a of the Medical School at the Maastricht University, in the
tutor evaluation questionnaire in which students are asked academic years 20012002 and 20022003. Students meet in
to rate the performance of the tutor on several dimensions. tutorial groups of about 10 students, twice per week in 2-hour
The results of these studies demonstrate that the instru- sessions. In these groups they discuss problems. During the
ments are reliable and valid if at least six student responses discussion, issues emerge that need further self-study. In
are available for one tutor. Since, in most problem-based between the tutorial meetings, students collect information
settings, the average group sizes are higher than six students, regarding the issues that need to be studied. In the next
these tutor evaluation questionnaires provide reliable infor- tutorial meeting they report to each other what they have
mation. However, the problem with these instruments is that found and synthesize the acquired information and apply it to
they are quite long. The instrument developed by De Grave the problem. A faculty member, the so-called tutor, guides
et al. contains 33 items and the instrument developed by
Dolmans et al. contains 22 items. If students are asked on a Correspondence: Dr Diana Dolmans, PhD, University of Maastricht,
Department of Educational Development & Research, PO Box 616,
regular basis to evaluate the performance of the tutor and are 6200 MD Maastricht, The Netherlands. Tel: 31-43-3885730; email:
also asked to evaluate other aspects of the problem-based d.dolmans@educ.unimaas.nl

534 ISSN 0142159X print/ISSN 1466187X online/05/060534-5 2005 Taylor & Francis
DOI: 10.1080/01421590500136477
A short questionnaire to evaluate the effectiveness of tutors in PBL

each tutorial group. In each 6-week course, about 23 to tutor level by computing average scores across students per
30 tutorial groups are involved, each guided by a tutor. Each tutor. In the confirmatory factor model, as specified in this
tutor is obliged to attend a 4-day staff development training study, all common factors were correlated, observed vari-
programme focusing on the principles behind problem-based ables 1 through 3 were affected by the first common factor,
learning and tutoring. observed variables 4 and 5 were affected by the second
common factor, variables 6 and 7 by the third common
Participants factor, 8 and 9 by the fourth common factor and 10 and 11
by the fifth common factor. For the confirmatory factor
Data were collected about the tutors performance in the analysis, all observed variables were assumed to be affected
tutorial groups during 22 6-week courses in the academic by a unique factor (error in each variable) and no pairs of
years 20012002 and 20022003: six first year courses, five unique factors were correlated.
courses in the second year, six courses in the third year In the following analyses, models were evaluated accord-
and five courses in the fourth year. The number of students ing to the recommendations of Hu & Bentler (1999). These
participating in each group is either nine or ten. The number authors empirically examined the behaviour of a variety
of students completing the instrument per group was at of fit indices and concluded that control of both Type I
least six. The average response rate was above 90%. The and Type II error is best achieved through a combination
number of unique tutors included in the study was 287 in of relative fit indices (e.g. Tucker-Lewis Index, also known as
20012002. In 20022003, 281 unique tutors were involved. the Bentler-Bonett non-normed fit index, NNFI, and the
Each tutor guided on average two tutorial groups within Incremental Fit Index, IFI), where model fit is greater than
one academic year. The majority of the tutors involved in or equal to 0.95 and the standardized root mean square
20022003 were the same tutors as in 20012002. residual, SRMR, with good models less than 0.08. Model
fit was estimated using robust maximum likelihood estima-
Instrument tion, a method less sensitive to violations of the normality
The instrument consists of 11 statements. At the end of assumption than other estimation methods (Boomsma &
each course, students were asked to indicate how much Hoogland, 2001). When a latent construct has fewer than
they agreed with each statement on a scale from 1 to 5 three indicators, this may lead to problems in identification,
(1 strongly disagree, 5 strongly agree). An example of such that the resulting estimated parameters are arbitrary
a statement is the tutor stimulated us to understand and an infinite number of alternative values might also be
underlying mechanisms/theories. An example of another appropriate. In order to identify each latent variable, the
statement is the tutor stimulated us to apply knowledge loading of one of the items onto the latent variable was fixed
to the discussed problem. The items of the instrument to 1 (Bollen, 1989).
are shown in the Appendix. Five factors were assumed to
represent the 11 items. The names of the factors and their Results
underlying items are also reported in the Appendix. Students
Descriptive statistics
are also asked to give an overall judgement (ranging from
110, 6 was sufficient, 10 was excellent) of the performance The mean score for the 11 items varies between 3.37 (SD
of the tutor (question 12). Furthermore, students were 0.43, scale 15) and 4.20 (SD 0.59, scale 15). The lowest
asked to indicate how many times the teacher was absent scoring item deals with searching for various resources. The
and whether he was replaced by another tutor or not. Finally, highest scoring item deals with the tutors motivation for the
students were asked to give tips for improvement. tutor role. The mean scores and standard deviations for
the 11 items are reported in Table 1. In Table 1, the average
Statistical analysis scores for the five factors are also given. As can be seen here,
these scores differ between 3.44 (SD 0.58, scale 15) and
When assessing the psychometric properties of teaching 3.96 (SD 0.49, scale 15). The average score for the overall
evaluation instruments, it is standard practice to conduct performance of the tutor was 7.46 (SD 0.77, scale 110).
such analyses on class averages, rather than individual level Cronbachs alpha reliability coefficients for the five
data (Marsh, 1987, 1991). Accordingly, tutorial averages factors and 95% confidence intervals for the reliability
for the 11 items were used in the following analyses, with coefficients (Fan & Thompson, 2001) are given in Table 2.
average scores reflecting ratings of the same tutor across Statistics for the 20022003 dataset are given. The con-
different tutorials. This is justified on the basis of Marshs fidence intervals around the estimates for reliability for each
(1987) finding that the correlation between overall ratings scale indicate the plausible range of values for internal con-
of different instructors teaching the same course (i.e. a sistency. The data in Table 2 demonstrate that the intervals
course effect) was negligible (r 0.05), while correlations are generally above levels for acceptability commonly used
for the same instructor in different courses are considerably in the social sciences, e.g. 0.70.
larger (r 0.61) and in two different offerings of the same
course were larger again (r 0.72). The hypothesized model
Construct validity
was first tested on class averages for 287 tutors evaluated
in the 20012002 academic year and cross-validated on In Table 3, the correlations between the five factors are
281 tutors evaluated in the 20022003 academic year. given. These correlations vary between 0.54 and 0.94 for
A confirmatory factor analysis was carried out to assess the 20022003 dataset.
the adequacy of the five factors underlying the items. For the A confirmatory factor analysis was conducted to assess the
confirmatory factor analysis, data were aggregated at the construct validity of the instrument. The hypothesized factor

535
D.H.J.M. Dolmans & P. Ginns

Table 1. Mean score (scale 15) and corresponding p < 0.001. Tutor average scores were transformed to normal
standard deviation (SD) for all the 11 items and the five equivalent deviates using PRELIS, a process which maintains
factors (F1 to F5), 20022003 (n 281 tutors). the mean, standard deviation and rank ordering of scores but
reduces skewness and kurtosis (Joreskog et al., 2001). Results
Mean (15) SD of further checks of multivariate normality following
F1: Constructive/active learning 3.78 0.46 transformation indicated reduced multivariate non-normal-
1. Summarizing in own words 3.52 0.46 itymultivariate skewness, z 9.32, p < 0.001, multivariate
2. Searching for links between topics 3.87 0.49 kurtosis, z 7.86, p < 0.001. A one-factor model with no
3. Understanding mechanisms/theories 3.95 0.49 covariances between item residuals was fitted to the normal
F2: Self-directed learning 3.46 0.39 equivalent deviate scores as a baseline model. This model
4. Generation of learning issues by students 3.55 0.44 had poor fit to the data, NNFI 0.86, IFI 0.88, SRMR
5. Searching for various resources 3.37 0.43 0.096. A five-factor model based on the five groupings
F3: Contextual learning 3.80 0.45 of items on the survey form was tested next. Inspection of
6. Application of knowledge to problem 3.96 0.44 parameter estimates revealed a small negative error variance
7. Application of knowledge to other 3.64 0.52 for item 8 (The tutor stimulated us to give constructive
situations feedback about our group work). While large negative error
F4: Collaborative learning 3.44 0.58 variances can indicate model misspecification, small error
8. Giving constructive feedback 3.39 0.55 variances are often due to sampling fluctuations, particularly
9. Evaluation of group co-operation 3.48 0.63 when fewer than three indicators per latent construct are
F5: Intra-personal behaviour 3.96 0.49 used (Chen et al., 2001). The error variance was fixed to
10. Knowledge about strengths/weaknesses 3.71 0.47 a small value, 0.0001, and the resulting measurement
as tutor model had good fit to the data, NNFI 0.96, IFI 0.98,
11. Motivation for tutor role 4.20 0.59 SRMR 0.054. Inspection of the latent factor correlation
matrix indicated several large inter-correlations, suggesting
the possibility of a second order factor. This model was
tested but did not have as good fit to the data as the
Table 2. Number of items per factor (n items), Cronbachs five-factor model, NNFI 0.94, IFI 0.96, SRMR 0.074.
alpha reliability coefficient with 95% confidence intervals for The five-factor model was, therefore, accepted as the most
20022003 dataset (n 281). parsimonious explanation of the covariances within the data.
95% confidence
No. of Alpha interval Model testing of the 20022003 dataset
items coefficient for alpha The hypothesized factor structure was cross-validated on
F1: Constructive/active 3 0.95 0.940.96 tutor average data from the 20022003 academic year.
learning As for the 20012002 dataset, tests of multivariate
F2: Self-directed learning 2 0.79 0.730.83 normality indicated significant multivariate non-normality
F3: Contextual learning 2 0.89 0.860.91 multivariate skewness, z 16.16, p < 0.001, multivariate
F4: Collaborative learning 2 0.93 0.920.95 kurtosis, z 10.91, p < 0.001. Normalization of the class
F5: Intra-personal behaviour 2 0.82 0.780.86 averages reduced multivariate non-normalitymultivariate
skewness, z 10.01, p < 0.001, multivariate kurtosis, z 8.33,
p < 0.001. As for the 20012002 dataset, a one factor
model did not fit the data well, NNFI 0.86, IFI 0.88,
Table 3. Correlations between factors scores, 20022003. SRMR 0.099. The five-factor model again had good fit to
the data, NNFI 0.98, IFI 0.99, SRMR 0.043. A model
1 2 3 4 5
with a higher order factor did not have as good fit to the
1. Active 0.74 0.94 0.53 0.80 data as the five-factor model, NNFI 0.95, IFI 0.96,
2. Self-directed 0.74 0.76 0.88 SRMR 0.075. The correlation between the students
3. Contextual 0.54 0.84 overall judgement of a tutor and the scores on the 11 items
4. Collaborative 0.79 also provides information about the construct validity of the
5. Intra-personal behaviour questionnaire. The overall judgement correlates highly with
Note: All correlation coefficients are significant at the 0.01 all 11 items, with correlations ranging from 0.65 to 0.87
level (2-tailed). (all correlations significant at p < 0.01).

structure was first tested on the dataset of 20012002 and


Conclusion
cross-validated on the data-set of 20022003.
The purpose of this study was to report on the validity and
reliability of a short instrument to assess tutor performance
Model testing of the 20012002 dataset
in PBL. The initial questionnaire consisted of 22 items and
Tests of multivariate normality were conducted on was based on theoretical notions underlying effective tutor-
the 11 items using PRELIS 2.54. Results indicated signif- ing. Because the students complained about getting tired of
icant multivariate non-normalitymultivariate skewness, filling out this quite long questionnaire, it was decided to
z 13.38, p < 0.001, multivariate kurtosis, z 9.10, shorten the questionnaire. Two researchers with extensive

536
A short questionnaire to evaluate the effectiveness of tutors in PBL

experience with tutor evaluation questionnaires drew upon Acknowledgements


theories and literature on effective tutoring to decide which
The authors thank Diana Riksen for her support in setting up
items within each factor best represented the underlying
the data-set.
factor; these items were included in the questionnaire. Thus,
the decisions about which items were removed and included
were based on knowledge and experience with research and
theories on tutoring and experience with tutor evaluations. Notes on contributors
The results of this study indicate that the five-factor DIANA DOLMANS is associate professor at the Department of Educational
model had a good fit to the data of both academic years. The Development & Research, University of Maastricht in the Netherlands.
alpha coefficients and the 95% confidence intervals demon- Her research interests include problem-based learning and quality
strated acceptable levels. Based on these findings, it can be improvement.
concluded that this short instrument (11 items) is valid and PAUL GINNS is Survey Officer at the Institute for Teaching and Learning
reliable. The data can be used for improvement and account- of the University of Sydney in Australia. His research interests include the
design and evaluation of student-centred teaching evaluation instruments
ability purposes or, in other words, the data can be used
and the application of cognitive psychology to the design of instructional
to judge the performance of the tutors on the five factors
materials.
underlying the 11 items.
While some of the factors were highly correlated, the
above analyses indicate a distinct five-factor model fits the
data better than a model in which each lower order factor References
simply loads on a single higher order factor. Furthermore, BOLLEN, K.A. (1989) Structural equations with latent variables (New York,
it is important to keep in mind that the analyses were Wiley).
conducted on aggregated scores (the scores were averaged BOOMSMA, A. & HOOGLAND, J.J. (2001) The robustness of
LISREL modeling revisited, in R. CUDECK, S. DU TOIT & D. SORBOM,
and at least six student responses were available per
(Eds), Structural Equation Modeling: Present and Future: A Festschrift
tutor). This implies that the instrument is only valid if
in honor of Karl Joreskog (Lincolnwood, Scientific Software
at least six student responses are available for one tutor. International).
Since the average tutorial group size is normally above six, CHEN, F., BOLLEN, K.A., PAXTON, P., CURRAN, P. & KIRBY, J. (2001)
tutor evaluation with this short instrument provides reliable Improper solutions in structural equation models: causes,
information. However, if less than six student responses consequences, and strategies, Sociological Methods & Research, 29,
are available for a tutor, the data should not be reported to pp. 468508.
DE GRAVE, W.S., DOLMANS, D.H.J.M. & VAN DER VLEUTEN, C.P.M.
the tutor and should not be used for summative purposes.
(1998) Tutor intervention profile: reliability and validity, Medical
Furthermore, we would recommend that negative conse- Education, 32, pp. 262268.
quences concerning promotion decisions should NOT be DOLMANS, D.H.J.M., WOLFHAGEN, H.A.P., SCHERPBIER, A.J.J.A. &
based on the performance of a tutor in one tutorial group, VAN DER VLEUTEN, C.P.M. (2003) Development of an instrument
preferably a tutors performance is assessed over time. to evaluate the effectiveness of teachers in guiding small groups,
In addition, one needs to keep in mind that tutoring is only Higher Education, 46, pp. 431446.
one specific aspect of a teachers competencies. Measuring FAN, X. & THOMPSON, B. (2001) Confidence intervals around score
reliability coefficients, please: an EPM guidelines editorial. Educational
a teachers competencies is preferably not only based on
& Psychological Measurement, 61, pp. 517531.
student ratings, but also on other sources of information, HU, L. & BENTLER, P.M. (1999) Cutoff criteria for fit indexes
such as peer assessment, self-assessment, etc. in covariance structure analysis: conventional criteria versus new
alternatives, Structural Equation Modeling, 6, pp. 155.
JORESKOG, K.G., SORBOM, D., DU TOIT, S. & DU TOIT, M. (2001) LISREL
Practice points
8: New statistical features (Lincolnwood, Scientific Software
International).
MARSH, H.W. (1987) Students evaluations of university teaching:
 In practice, short questionnaires are needed to evaluate research findings, methodological issues, and directions for
a tutors performance, because this is more convenient future research, International Journal of Educational Research, 11, pp.
for students. 253388.
 A short tutor evaluation questionnaire consisting of MARSH, H.W. (1991) Multidimensional students evaluations of teaching
effectiveness: a test of alternative higher-order structures, Journal of
11 statements proved to be valid and reliable.
Educational Psychology, 62, pp. 1734.
 The analyses revealed that five factors loaded on the
VAN BERKEL, H.J.M. & SCHMIDT, H.G. (2000) Motivation to commit
11 statements: active/constructive, self-directed, con- oneself as a determinant of achievement in problem-based learning,
textual and collaborative learning and intra-personal Higher Education, 40, pp. 231242.
behaviour.

537
D.H.J.M. Dolmans & P. Ginns

Appendix: Short tutor evaluation questionnaire, Maastricht Medical School, 20022003

Strongly Strongly
disagree agree
Constructive/active learning
The tutor stimulated us . . .
1. . . . to summarize what we had learnt in our own words 1 2 3 4 5
2. . . . to search for links between issues discussed in the tutorial group 1 2 3 4 5
3. . . . to understand underlying mechanisms/theories 1 2 3 4 5
Self-directed learning
The tutor stimulated us . . .
4. . . . to generate clear learning issues by ourselves 1 2 3 4 5
5. . . . to search for various resources by ourselves 1 2 3 4 5
Contextual learning
The tutor stimulated us . . .
6. . . . to apply knowledge to the discussed problem 1 2 3 4 5
7. . . . to apply knowledge to other situations/problems 1 2 3 4 5
Collaborative learning
The tutor stimulated us . . .
8. . . . to give constructive feedback about our group work 1 2 3 4 5
9. . . . to evaluate group co-operation regularly 1 2 3 4 5
Intra-personal behavior as tutor
10. The tutor had a clear picture about his strengths/weaknesses as a tutor 1 2 3 4 5
11. The tutor was clearly motivated to fulfil its role as a tutor 1 2 3 4 5
Global score
12. Give a grade (110) for the overall performance of the tutor (6 being sufficient, 10 being 1 2 3 4 5 6 7 8 9 10
excellent)
Absence/replacement
13. How often was your own tutor absent? 0 1 2 3 4 5 6>
14. How often did your tutor take care of replacement when being absent? 0 1 2 3 4 5 6>
Open question
15. Give the tutor tips for improvement (formulate shortly). Do this especially if you gave your tutor a score below six.

538

Anda mungkin juga menyukai