Anda di halaman 1dari 2

A Research Note


Traditional sensory difference tests demonstrate whether a differ-
ence exists or not between two flavors; to determine the degree of
difference requires further scaling. The two processes can be com-
bined for perceptually small differences using difference tests based
on signal detection measures. Such measures provjde a measure of
degree of difference directly which being a probabdity value is sus-
ceptible to analysis by parametric statistics Traditional signal detec- ~
tion measures are complex and time consuming but the present
paper outlines a short and simple means of calculating such mea-
THE COMMONLY USED flavor difference tests: triangle,
pair comparison, duo-trio, etc. (Amerine et al., 1965) are
designed to determine whether a difference occurs between
the flavors of two foods. Should there be a difference it is
sometimes useful to determine the degree of difference. For
perceptually large differences this can be achieved using
traditional scaling methods (Stevens, 1960) but for very
slight differences, the lack of judges skill in using numbers
may create enough variance to swamp any slight differ-
ences. Further, there is circumstantial evidence in sensory
psychophysics to suggest that humans are so unskilled in
their use of numbers as to call into doubt the use of para-
metric statistics in analysing numerical data generated by
them; this latter point, however, remains controversial.
However, the degree of difference for these small differ-
ences, can be measured directly by using the so-called Sig-
nal Detection measures (Green and Swets, 1966) which are
difference tests yielding a measure of the degree of differ-
ence directly. The most general signal detection measure :
applicable to difference testing is the index P(A), which is
sometimes denoted as R by those who use a short-cut
procedure to obtain it (Brown, 1974). The traditional tech-
niques for finding P(A) are too lengthy for sensory analysis
but the shorter technique, developed for studies on mem-
ory (Brown, 1974), is very applicable. Basically R or
P(A) can be visualised as a probability value-the probabil-
ity of correctly choosing one of two samples in a pair com-
parison task. It has the advantage that in its determination,
judges are not required to generate numbers, so avoiding
the reliability issue; they are merely required to state b
whether they are sure of their judgement or not. From
these simple data can be calculated a probability value,
more safely amenable to parametric analysis than scaling
P(A) or R can be calculated simply as follows. Let us
assume a judge tastes 10 samples of foodstuff B and 10
samples of a reformulation A in random order. Immedi-
ately the sensory analyst will protest the large number of
samples per judge which, though desirable, can usually only
be tasted in the setting of an academic laboratory. How-
Author OMahony is with the Dept. of Food Science & Technology,
University of California, Davis, CA 95616.
01979 Institute of Food Technologists
302-JOURNAL OF FOOD SCIENCE- Vol. 44, No. 1 /197!/
ever, the required number of samples (say 20) could be
obtained by spreading the load evenly over replications
(taste 4 over 5 replications) and/or subjects (5 subjects, 4
replications each, etc.), thus obtaining a composite panel
R;value rather than one per judge. It is advisable here, that
the data required for the calculation be distributed evenly
over judges and replications rather than allow one judge or
replication an unevenly large influence over the final value.
Thus the judge (or panel over replications) tastes, say, 10
samples of foodstuff B and 10 samples of a reformulation,
A, in random order. He is required to rate each of the 20
samples as definitely A (A), perhaps A (A?), perhaps B (B?)
or definitely B (,B). Let us suppose that when tasting A he
rated eight samples A, one A? and one B?, and when
tasting B he rated seven as B, two B? and one A?. The
results can be summarized in the response (Fig. 1).
Fig. l-Response matrix.
To calculate R, we now predict from these data what
might happen in a hypothetical experiment should each of
the A samples be presented in pair comparison with each of
the B samples. How many times would A be correctly iden-
tified (out of 10 x 10 = 100 pair comparisons)?
Let us consider the A samples rated as definitely A (A).
They would be correctly identified when paired with any of
the B samples which were rated A?, B? or B. Even
though one of the B samples was identified as A? it would
still be chosen as B when compared to A samples rated as
A. So this gives us 8 x (1 + 2 + 7) = 80 correct identifica-
tions of A, so far. The A sample rated A? would be cor-
rectly identifiedif compared with the two B samples rated
B? or the seven samples rated B (another (2 + 7) x 1 = 9
correct), but when compared with the B sample rated A?
the subject would not know which to choose because they
were both rated as the same by him (so score one compari-
son as dont know). Similarly, the A sample rated B?
would be correctly identified when compared with the
seven B samples rated B (another 7 correct) but again the
subject would be undecided with the B samples rated B?
(score another 2 dont knows). So the predicted final tally
of pair comparisons is 80 + 9 + 7 = 96 correct identifica-
tions of A and 3 dont knows; similarly one of the pair
comparisons would be totally wrong, namely the A sample
rated B? when compared with the B sample rated A? It is
assumed that when the subject is undecided he guesses cor-
rectly half the times. This makes the final score 96 + 3/2 =
97.5 correct identifications out of 100. This score is the R
Index or P(A). Hence R = 97.5% or 0.975.
This exercise is a conceptually easy way of manipulating
the data so as to arrive at a value that is exactly equal to
P(A) (OMahony, 1977). It is not the traditional method
(Green and Swets, 1966) but it is equivalent and provides a
simple conceptual understanding of the index.
A A? B? B
Response 1 Response
A A? B? B A A? B? B
R = 100% R = 100% R= 100%
Fig. 2-Response matrices
The numerical value actually obtained has the advantage
that it is a numerical value rather than the traditional differ-
ence/no difference value thus providing more information.
Admittedly the error rate in a traditional test is such a
numerical measure but the R-index also takes into account
information regarding the judges degree of certainty, which
is ignored by the usual tests. As for the meaning of R =
97.5%, it simply means that the judge is expected to cor-
rectly differentiate A and B 97.5% of the time which is
good differentiation. To ask what exactly is the level that
decides good or bad differentiation is defeating the purpose
of the test. It loses information by converting the answer
back into a difference/no difference situation; any such
procedure is as arbitrary as choosing p < 0.01 or p < 0.05
as the statistical level of difference.
It is instructive to consider how this technique, by con-
sidering the certainty of the judges responses, overcomes
response bias, the tendency to name all samples A or all
samples B. To illustrate this, consider the above three
response matrices that could be produced by the aforemen-
tioned experiment. It should be pointed out these matrices
are not what is usually obtained in such situations; a more
even distribution of numbers across the matrix, indicating
less certainty, would be more likely. However, they illus-
trate the point simply (Fig. 2).
In Figure 2(I) all A samples are rated A; and all B
samples are rated B. Using a standard difference test, the
samples would be distinguished, while R = 100%.
ln Figure 2(H) all B samples were rated B and A sam-
ples B?. Although the samples were distinguished by the
fact that the judge was unsure with some judgements, a test
that merely required subjects to identify whether the sam-
ples were A or B would register no difference. The judges
response biase to believe that all samples were B can be
overcome using the additional certainty judgements. The
same is true for the third matrix where the bias is towards
calling everything A.
Naturally with these degrees of difference, the refine-
ment of the R-index as a measure would hardly be neces-
sary, but they illustrate the point. It should also be noted
that overcoming response bias is not exclusive to signal de-
tection; forced-choice procedures do the same for the tradi-
tional tests.
An advantage of this approach compared with tradi-
tional signal detection measurement is brevity. Using tradi-
tional signal detection techniques, around 200 readings are
required to compute P(A); using this method it can be done
in approximately 20. Using computations of R every 20
presentations over a series of 200, for a difference test be-
tween water and 3 mM NaCl, it was found that no signifi-
cant differences occurred in R-values. Thus any error in-
volved in using fewer readings is random; there are no sys-
tematic effects. This is also the case for traditional differ-
ence tests. It is necessary to-establish this before using the
test; a systematic improvement in performance would have
demonstrated the need for obtaining a larger sample of
readings. Certainly the test was found simple to use when
six judges tested flavor differences in sherry samples, using
twenty presentations of each sherry sample and a dorsal
flow technique (OMahony and Davies, 1978).
The R-index procedure is flexible. Instead of testing for
differences between one product B and its reformulation A,
two reformulations (A1 and As) can be tested simultane-
ously. A random presentation of Ai, As and B can be rated
(A, A?, B?, B; A could be simply not-B) and two R-indices
calculated by comparison to Bs ratings. This was found to
be a simple task when judges compared differences between
colas (OMahony et al., 1978).
R-indices could also be calculated by using a ranking
technique. Ranked data, say for eight samples, can be read-
ily transformed to rated data (lst, 2nd = A; 3rd, 4th = A?
etc). The difference here is that the experimenter, rather
than the subject, defines the difference between certainty
and uncertainty. Such techniques are, at present, being
compared to rating procedures. If viable, they allow the
possiblity of analysis of ranked data by parametric statis-
Thus Signal Detection measures can be seen to be a via-
ble and flexible tool for flavor measurements. They are not
a panacea; they do not replace existing techniques but
merely add a possible refinement to difference testing,
should it be needed, while keeping the task of the judge
simple. It adds just one more small weapon to the arsenal of
sensory testing.
Amerine, M.A., Pangborn, R.M. and Roessler, E.B. 1965. Principles
of Sensory Evaluation of Foods. Academic Press, New York,
Brown, J. 1974. Recognition assessed by rating and ranking. Brlt. J.
Psychol. 65: 13.
Green. D.M. and Swets, J.A. 1966. Signal Detection Theory and
Psychophysics. Wiley, New York, N.Y.
OMahony. M. 1977. Towards shorter criterion free sensitivity mea-
sures. Presented at 6th International Symposium on Olfaction
and Taste, July, 1977. Orsay. Paris, France.
OMahony, M. and Davies, M. 1978. A signal detection approach to
taste difference testing between two levels of alcohol in a flowise
presented sherrv stimulus. IRCS Medic. Sci. 6: 189.
OMahony, M., He&z. C. and Autio, J. 1978. Signal detection dif-
ference testing of colas using a modified R-index approach. IRCS
Medic. Sci 6: 222.
Stevens, S.S. 1960. The psychophysics of sensory function. Amer.
Scientist. 48: 226.
Ms received 6113178; revised a/31/78; accepted 918178.
Vol. 44, No. 1 (1979)-JOURNAL OF FOOD SCIENCE-303