Sarah (dean of the faculty), to Roger, head of the accounting department, at a cheese and wine
party to welcome new staff): Hi, Roger. How are you?
Roger: Well things are looking good. I think we are going to get a 5. It might be touch and go
but I’ve been told that we’re almost there (but keep that to yourself… James told me the news
but I’m not supposed to tell anyone). Refereed articles are up on the last two years. We could
do with a few more external grants but I don’t think there is much possibility that Faculty
Board will be able to turn down our bid for two new members of staff. The future’s looking
good for us. With a bit of luck, in the next couple of years, we’ll be able to…
Sarah (interrupting Roger): No. Roger. I meant how are you…personally.
Roger (taken aback): Oh, mmm, uh, oh, fine, Sarah, just fine…er, how are you?
The UGC was reported as saying that the research rating was based on
published research work and the amount of outside funding received by
departments. Apart from these general criteria, however, the UGC did not
disclose how the ratings were arrived at. Individual accounting departments
were able to write to Professor John Sizer, who chaired the assessment sub-
committee, but no details were revealed as to why departments attracted the
rating they did nor was any appeals procedure allowed.
Evaluation of departmental research output was not wholly new to British
university accounting. The first attempt to assess the research performance of
accounting departments occurred in 1978 when Lyall published his analysis of
26 professional and academic journals between 1972-76. Lyall (1978) counted
the pages attributable to each department on several different bases. In terms of
“pages in refereed journals”, the top five research departments were
Manchester, Lancaster, Edinburgh, Liverpool and London Business School,
whereas using “unweighted standard pages” the top five departments were
Manchester, Heriot Watt, Loughborough, Edinburgh and Lancaster. The advent
of the publication of the British Accounting Association’s (BAA) Research
Register subsequently encouraged further analysis of research outputs. Groves
and Perks’s (1984) examination of the 1984 BAA Research Register (containing
publications between 1982 and 1983) classified the top five research
departments as Lancaster, Strathclyde, Liverpool, LSE and Exeter. Gray et al.
(1987) analysed the 1986 BAA Research Register (covering 1984-1985). They
produced a different top five of Birmingham, Manchester, Warwick, Leeds and
Nottingham – a lack of comparability which was seen as merely reinforcing the
practical problems faced by such an evaluation process.
The UGC’s 1986 ratings did little to break such an observational trend, with
Nobes’s comparison of the inconsistencies between Gray et al.’s rankings and
those of the UGC, concluding that:
The fact that there is little correlation…does not help to determine whether either (evaluation)
is meaningful. These exercises are clearly subject to rapid out-dating and to difficulties of
interpretation” (Nobes, 1987, p. 289).
Such inconsistencies reinforce the above noted concerns as to the lack of Questioning
transparency in the UGC’s ratings. Appeals were made for more information to the value of
be provided as to how particular ratings were arrived at – especially given that selectivity
the UGC’s ratings had a distinctive status compared with those of any previous
assessments in that the underlying official motivation of the ratings exercises
was to direct research funding in a more selective manner. However, the
anticipated financial consequences of receiving a particular research rating, 145
somewhat ironically, did not materialize, with Bourn (1986) noting, in an
accounting context, that the disquiet felt in many institutions as a result of the
assigned ratings was out of all proportion to their relatively small immediate
financial impact.
Later in the report, the UGC lent its unequivocal support to the submission
received from the Conference of Accounting Professors, which had argued
against the establishment of “teaching-only” departments:
In the course of our review, we were concerned to discover that pressures exist which, whether
by accident or design, would have the effect of turning some accounting groups into teaching
only units. These pressures include … increasing selectivity in resourcing within institutions
and by research councils. It is our contention that the arguments which might support the
creation of teaching only groups in some subjects do not apply in the BMS (business and
management studies) and accounting field. Research costs consist mainly of salaries and
economies therefore cannot easily be achieved by exclusive concentration of research efforts
in large units … There are reasons for doubting the quality of university teaching unsupported
by scholarship or research. We recommend that teaching only groups should not be supported.
(UGC, 1988, emphasis added).
In 1988, Times Higher Education Supplement also undertook its Peer Review of
Accountancy (see Times Higher Education Supplement, 24 June 1988). The
survey was based on the replies received from heads of department to a simple
questionnaire, including the question: “Which in your view are the five best
departments in British higher education institutions in your subject bearing in
AAAJ mind mainly the output and quality of research?” The responses closely
8,3 followed the UGC’s 1986 ranking with Lancaster, Manchester and the LSE “way
ahead of the few others to attract more than one vote”. Some departmental
heads were unhappy with the survey, including Professor Tom Lee, then of
Edinburgh University, who was quoted as follows:
This department does not wish to participate in your beauty contest as it would not know how
146 to rank its sister departments. Nor does it wish to be ranked by the latter (Times Higher
Education Supplement, 24 June 1988).
The UGC carried out its second research selectivity exercise in 1989, making
various changes in response to criticisms of the first exercise. The UGC
acknowledged publicly that it had sought to rectify nine main weaknesses (see
HEFC, 1993, para. 5)[4]. The 1989 exercise sought more information concerning
research activities than in 1986 and focused explicitly on the individual units of
assessment rather than on university-wide data. Details of up to two
publications per member of staff were required, in addition to information on
research students, external research income and research planning and
priorities. A common 5-point rating scale was used and expressed in numerical
form. By the time the results were published, the UGC had been transformed
into the UFC and the 1989 UFC research categories were as follows:
● 5 = international excellence in many areas, national excellence in all
others;
● 4 = national excellence with some evidence of international excellence;
● 3 = national excellence in a majority of areas or limited international
excellence;
● 2 = national excellence in up to half of areas;
● 1 = little or no national excellence.
In accounting, the general standard of research performance appeared
relatively low, with only Lancaster obtaining a 5 rating, and Bristol,
Manchester, the LSE and Aberystwyth each receiving a 4 rating. Fifteen (60 per
cent) of the 25 rated accountancy departments obtained only a 1 or 2 rating. In
law, a subject often compared with accountancy, just 22 per cent of the rated
departments received a rating of 1 or 2, with 38 per cent receiving a 4 or 5 rating
(the comparative figure in accounting being 20 per cent).
Following the 1989 review, the UFC strove to place increasing importance
on research ratings as the basis for allocating research funds. Government
policy in this respect had been reiterated in an education White Paper in May
1991 and in subsequent letters from Educational Secretaries of State to the
various national funding bodies. The UFC created a new formula funding
approach for 1991/92 in which the total block recurrent grant was determined
through allocations across the three categories of teaching (T), research (R)
and special factors (S). The allocation of funds for research was made up of
money for direct research (DR), contract research (CR), staff research (SR) and
judgemental research (JR). The money a university received through DR and Questioning
CR was related directly to research grant income received from non-UFC the value of
sources (this had little impact on accounting groups). The SR figure was selectivity
dependent on the total number of UK weighted students while JR was
influenced by the product of weighted student numbers and the research
rating of the group (see Mace, 1993, p. 72). It can therefore be seen that, while
research ratings were now a more explicit revenue determinant, the use of the 147
student multiplier meant that universities could, theoretically, compensate for
any falling research income (caused by poor ratings) merely by expanding
student numbers.
The UFC originally proposed to carry out its third assessment exercise in
1993, but the creation of the HEFCs for England, Scotland and Wales led to the
exercise being brought forward to 30 June 1992, so that the ratings could be
used by the new funding councils in the determination of grants for research
with effect from 1993-94. The exercise now covered all higher education
establishments (including the “old” universities and the ex-polytechnics/
colleges of higher education, now called universities). As with the 1989 exercise,
the one in 1992 differed from its predecessor in several important respects,
although the “improved” system utilized in 1989 still contained seven key
weaknesses (HEFC, 1993, para. 7)[5]. The 1992 exercise required all submitting
institutions to put forward only those staff who were actively engaged in
research, and the exercise was made less retrospective by seeking detailed
information on staff in post on 30 June 1992 (the “snapshot” approach), rather
than those who had been in post at any time during the assessment period. In
recognition of the longer time scale for research in the arts and humanities, the
assessment period for these units of assessment was extended by one year to
four-and-a-half years[6] and work accepted for publication was allowed to be
listed by departments.
The 1992 exercise also introduced changes to the funding formula. The
allocation of research money on the basis of student numbers was phased out,
with the number of active research staff (as at June 1992) being used as the
volume multiplier. The judgemental research allocation was now dependent on
a formula which included the “assigned research rating less 1”, meaning that an
accounting group with a rating of one would receive no judgemental research
funds. The impact of this change meant that departments, in completing their
1992 return, essentially had to gamble on the financial benefits of increasing
research volume by submitting more staff in the category of “active”
researchers against the costs of diluting the overall quality of the departmental
research being assessed. The more marginal the member of staff in terms of
research quality, the greater was the risk that a diminution of quality would
offset any gains in research volume[7]. Despite protestations of the illogicality of
requiring universities to gamble (e.g. see Whittington, 1993), the same system
will be in place for 1996, although the HEFCs have stressed that, in removing
the need for departments to produce a total list of publications, they are
reinforcing the importance attached in the assessments to research “quality”
AAAJ rather than “volume” (HEFC, 1994, para. 24). Nevertheless, they still insist
8,3 (when discussing research active staff) that departments will “need to be aware
that research funding will continue to be influenced by the volume of research
(including staff) assessed” (HEFC, 1994, Annex C, para. 15).
The outcome of the 1992 exercise revealed that of the 31 accounting
departments submitting returns, 12 included 95-100 per cent of their staff,
148 seven entered 80-94 per cent, two 40-79 per cent and ten submitted less than 39
per cent. There was also a reasonable correlation between the eventual rating
and the proportion of staff entered, so that those at the top of the assessment
tended to enter all their staff. The LSE and Manchester each received a 5 rating
(Lancaster, the other 5-rated accounting department in 1989, was assessed as a
part of the Lancaster University Management School under the business and
management studies panel). Eight departments received a 4 rating, six a 3
rating, five a 2 rating, while ten departments (nine of which were in “new”
universities) obtained the minimum 1 rating.
All resource calculations which flow from such ratings had, up to and
including the 1992 exercise, been conducted on a university basis. There had
been no explicit requirement for universities to distribute research funds in
strict accordance with the formulae being used by the HEFCs. This lack of
transparency between the funding calculations of the HEFCs and the internal
allocation within a university had its roots in a perceived need to preserve
university independence[8] and there is some evidence to suggest that the
funding councils originally never had any intention to seek to “follow the
money” through into individual departments (see Swinnerton-Dyer, 1985). Over
the course of the ratings exercises, however, there have been some unofficial
calls and promptings for the funding councils to become more active in
monitoring university expenditure. Nevertheless, some universities continued,
even in the immediate aftermath of the 1992 exercise, to ignore the HEFC’s
desire for the results of research selectivity to strengthen “centres of excellence”.
For example, in a 5-rated accounting department, each “active” researcher
would generate approximately £28,000 of research funding for the university.
However, instead of rewarding the highly rated departments in this manner,
some universities chose to use the HEFC money to strengthen lowly rated
departments – this policy being perceived to be both more attractive in terms of
the university’s overall “image” and an easier way of increasing income in the
next selectivity exercise. As Mace (1993, p. 19) commented, universities have
seemed “to see the law of diminishing returns applying if resources are
allocated to already highly rated cost centres”.
In the run up to the 1996 assessment exercise, however, the HEFCs have
given their clearest indication to date that they intend to trace the way in which
research funds are deployed within both universities and their composite
departments (see HEFC, 1993, para. 83). There have also been reports that the
HEFCs are planning to launch a comprehensive audit system to monitor
university expenditure. Speaking of a pilot audit programme, Graeme Davies,
chief executive of the Higher Education Funding Council for England, noted
that “it is a major exercise and will get progressively more intensive. And if we Questioning
uncover data suggesting that funds are being used incorrectly, we will look the value of
seriously at the nature of the funding for the institution concerned” (Times selectivity
Higher Educational Supplement, 4 March 1994, p. 1 ).
Despite the above assertions of the UFC, and the acknowledged significance of
research ratings to the process of allocating funds, a relatively widespread view
among assessment panels was that the rating scale of 1 to 5, and associated
definitions, was difficult to apply, especially in the arts and humanities, where it Questioning
was suggested that a different scale could have been more helpful (HEFC, 1993, the value of
paras 46-7). selectivity
The 1996 rating exercise will utilize a 7-point scale, but the possibility of any
past or future ratings exercise living up to the above assertions of the UFC is not
likely, given the essentially subjective nature of the whole ratings process:
151
The first conclusion must be that the research rating exercise is ultimately a matter of
subjective judgement, however many “objective” measures are fed into it. This means that
there are no simple statistical rules, such as “publish n papers in refereed academic journals”
which will guarantee a high or improved rating. Given the essentially arbitrary nature of such
rules, this is probably no bad thing (Whittington, 1993, p. 393).
The HEFCs offered no guidance on the “weights” panels might give to different
aspects of research, such as journal articles, books, research studentships, the
generation of research income, research potential and plans for the future (see
O’Brien, 1994). It is known that some panels relied more heavily than others on
statistical analyses of performance indicators, while others, as in accounting, took
a more pragmatic view of research quality (“we know it when we see it” –
Whittington, 1993, p. 385). Interestingly, despite claims that the research rating
process contains no simple rules as to how to secure a high rating and was a
complex professional assessment of research quality dependent on “the
competence and integrity” of panel members (see Whittington, 1993, pp 390-3),
statistical surveys of the ratings exercise have portrayed a rather more
predictable process in some subjects (see the July 1994 report of the Joint
Performance Indicators Working Group of the HEFCs). For instance, in the case
of business and management studies, eight of the 12 possible elements of research
output[10] had contributed significantly to the research ratings awarded –
whereas, in accounting, only three of the factors correlated significantly with the
research ratings (these being articles in academic journals, total publications and
short works).
Ratings across units of assessment also vary widely, the extent of dispersion
again casting doubt on the comparability of the assessment methods applied by
subject panels. For instance, in 1992, 65 per cent of departments assessed in
anthropology were awarded a 4 or a 5, compared with 32 per cent in accoun-
tancy, 27 per cent in sociology, 14 per cent in business and management studies,
and just 12 per cent in social work (see O’Brien, 1994). In interpreting such
differences it could be claimed that some assessment panels retained a greater
degree of loyalty to their subject than to the ratings exercise. For instance, in
accountancy, among the 12 departments which were strictly comparable, the
average rating improved from 2.5 in 1989 to 3.5 in 1992, and in no case did the
rating decline. However, in law, in a period in which it is widely acknowledged
that law departments had devoted unprecedented efforts to research, only three
were deemed to have improved their research quality, while ten departments had
their rating downgraded.
AAAJ The impact of ratings exercises on research activity
8,3 Having acknowledged that research ratings exercises are essentially subjective
processes in terms of their assessment of research quality, it may seem a little
strange to ask the question as to the extent to which such exercises have
improved research “quality”. However, such questions are essential if the
purpose of the assessment exercises is not to be lost. More significantly for this
152 article, the answers to such questions provide powerful illustrations of the
fundamental limitations of research assessment exercises in their stated task of
improving research “quality”.
Surprisingly, the HEFCs (and their predecessor, the UFC) are rather cautious
on this issue. Despite claims that the 1992 ratings were comparable across all
units of assessment, no such confidence is applied to any comparisons of
“research quality” across time:
Although the same rating scale has been used in both the 1989 and 1992 Exercises, with the
same definitions, the 1992 Exercise has been carried out on a different basis and the results
should not be directly compared with the earlier ratings (UFC, 1992a, para. 17).
One does not have to look very far to find other indications that the ratings
reveal very little about changes in research quality. Perhaps some of the most
severe criticisms of research selectivity have been directed at its encouragement
of increased output at the expense of quality. For instance, Times Higher
Education Supplement (4 December, 1992, p. 1) highlighted work which
suggested that, while the number of articles published by British academics in
leading scientific journals had increased, the number of citations of British
papers went down (in contrast to the rest of the European Union and the USA
where citations increased). Sir David Phillips, head of the advisory board for the
research councils, pointed out that in the past many dramatic scientific
breakthroughs had been achieved by scientists who had worked on projects
even for ten years without publishing anything. He commented:
I suspect many scientists have been changing their behaviour or even the nature of the
research they do in order to optimise their performance in the RAE. If that leads to people
always doing research that leads to publishable research in three years’ time, I certainly do not
think it is a good thing (Times Higher Education Supplement, 4 December 1992, p. 1).
Notes
1. For example, see Murphy’s (1994) discussion of recent developments in Australia.
2. A recent report in The Guardian’s education supplement (1994, p. 2) on the impact of
market-led approaches in universities included the following observation from Colwyn
Williamson, founder of the Council for Academic Freedom and Academic Standards: “The
disincentive to exposing what you see as serious academic problems are considerable. For
a start there’s a general demoralisation among academics at the moment. Every stage of
your progress is subject to the goodwill of your superiors. Every year there are increments
you may be granted or denied...everything depends on not making enemies of your
superiors”.
3. For instance, by a government which aside from advocating research selection has, over a
13-year period, given out £31 billion in income tax cuts to the predominantly wealthy (the
top 1 per cent of taxpayers receiving 27 per cent of this amount – see Dean, 1993).
4. These included, among other things, that: the criteria for assessing research quality had
not been made clear to universities; interdisciplinary research had not been properly
assessed; different assessment standards had been used for different subjects: an appeals
mechanism had not been in existence, and the exercise had been largely retrospective,
taking little account of work in progress and research potential.
5. Most significantly, these still included a lack of clarity regarding assessment criteria and
the continuing retrospective nature of the assessment exercise.
6. Accounting was included under arts and humanities. although for the forthcoming 1996
exercise it has been excluded.
7. In general terms, as the effect of an improvement in research rating was greatest at the
bottom of the scale, it would have been beneficial for a group expecting to receive a 1 rating
(if all its staff were included) to exclude its most marginal researchers in the hope that this
would move the rating up to a 2, since this would increase the income from zero. The
position for those expecting middle-ranking ratings was much less certain, since a group
which might have got a 3 if all staff were included, might suppose that it would get a 4 if,
say, two members were excluded. However, if the forecast was wrong and the group still
received only a 3 the group would have lost two members of staff who would have qualified
for the per capita research funding allocations.
8. The maintenance of this position was also a possible reflection of a high degree of
arbitrariness in the way the funding councils apportioned research funds across subject
groups. For instance, an accounting researcher is valued at over £7,000 per rating point
and a management researcher at just over £4,000 per rating point. Similar inequities exist
in other subjects, e.g. chemistry and physics, where a research chemist is valued at
considerably more than a research physicist.
AAAJ 9. Although it could be argued that applied research was compensated through external
research funding – with it being easier for “applied” than for “pure” subjects to secure such
8,3 funding from industry and commerce.
10. These were: authored books, edited books, short works, refereed and other conference
papers, articles in academic, professional and popular journals, reviews of academic
books, other publications and output, and total publications.
11. For Arrington, a major problem in the accounting discipline lies in the fact that
162 “prescriptions about what constitutes ‘good’ research have been laid out by a hegemonic
academic élite whose ideas read like the antithesis of current philosophy of science” (p. 6)
Similar concerns have been voiced in economics and business management by
practitioners as well as academics about the relevance and value of such research, (for
example, see Harley and Lee, 1995; Harvard Business Review, September-October and
November-December 1992).
12. Hutchinson (1989) makes a similar point when drawing attention to the wider definition of
research adopted by the body responsible for maintaining academic standards in the old
polytechnic sector, the Council for National Academic Awards.
13. This is particularly true of joint research grants, where funding bodies require the grant to
be assigned to one person and therefore one institution, preventing collaborating
researchers in different universities from including the grant in their department’s
submission.
14. The relative absence of such policy critiques in accounting journals should not be seen as
too convincing counter evidence – after all, as Puxty et al. (1994) have stressed, such
activity on the part of academic accountants currently receives little reward in the research
ratings process.