Anda di halaman 1dari 11

JOURNAL

OF EXPERIMENTAL

SOCIAL

PSYCHOLOGY

16, 497-507 (1980)

A Note on the Analysis of Designs in Which Subjects Receive Each Stimulus Only Once
DAVID A. KENNY
University of Connecticut

AND ELIOT R. SMITH


University of California at Riverside

Received December 6. 1979 In social psychological experiments, the manipulations of interest are often presented to subjects along with, or as part of, some stimulus. An example would be a manipulation of the verbal label associated with a stimulus photograph, in a person perception study, Ordinarily it is not possible for stimuli to be completely crossed with treatments because subjects cannot be exposed to any stimulus more than once. In such designs it is wise to counterbalance the assignment of stimuli to treatment conditions, but this gives rise to difficulties in the anaIysis of the experimental data. Data from such designs have frequently been misanalyzed in the published literature. This paper presents a method of constructing a counterbalancing scheme to simplify the analysis, and appropriate methods of analysis for both the simplified and the general case. It is emphasized that stimuli are generally best treated as a random factor in such designs, permitting increased generalizability, but the case where the stimulus factor is fixed is also considered.

Researchers in social psychology often must embed a manipulation of some kind within a stimulus context. For example, in the field of person perception researchers have recognized the inadequacy of designs that ask subjects to respond to two- or three-word stimuli like a black or a handicapped person. An increased concern with the realism of experimental tasks has led researchers to use more complex stimuli (for example, a paragraph-long description of a person) as contexts in which the
This research was supported in part by National Science Foundation Grant BNS 7826672 and in part by the University of California Academic Senate. Requests for reprints should be sent to David A. Kenny, Department of Psychology, University of Connecticut, Box U-26). Storrs, CT 06268. 497 0022-1031/80/050497-11%02.WO
Copyright @ 1980 by Academic Press, inc. Ail rights of reproduction in any form reserved.

498

KENNY

AND SMITH

manipulation of interest (for example, race or handicap) can be embedded. In other areas, attitude-change researchers embed manipulations of source credibility within a persuasive communication; attribution researchers manipulate consensus or distinctiveness information in the context of a sentence describing some event; and so on. Besides advantages in the areas of realism or meaningfulness of experimental tasks for subjects, the use of multiple stimuli can provide both increased power and improved generalizability. Power is gained by averaging over multiple stimuli in the same way that power is enhanced by averaging over more subjects. Generalizability is obtained by replicating treatment effects across different stimuli. In Table 1, we have diagrammed five possible designs from among which researchers might choose to estimate the size and consistency (statistical significance) of treatment (manipulation) effects. For simplicity of presentation, each design in the table has four subjects, four stimuli, and two treatments. The designs involve the collection of different subsets of data; thus designs 2 through 5 can be viewed as design 1 with missing data. A given design may cause the researcher to modify the model (because of carryover effects, heterogeneity of covariance, or position effects), but these factors are not considered in this paper.
TABLE
FIVE DESIGNS FOR SUBJECTS,

1
STIMULI, AND TREATMENTS

Subjects 1 B B B B 1 1 1 2 2 1 1 1 2 2

Stimuli Design 2 B B B B Design 2 1 1 2 2 Design 2 1 1 2 2 1 3 B B B B 3 3 1 1 2 2 5 3 2 2 1 1 4 B B B B 4 1 1 2 2 4 2 2 1 1

Subjects 1 lb

Stimuli Design 2 2 3 1 2c 2 1 1 1 1 1 Design 2 1 1 1 1 4 3 2 2 2 2 4 2 2 2 2 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

a Both levels of the treatment. b Level 1 of the treatment. c Level 2 of the treatment.

RECEIVING

STIMULI

ONLY

ONCE

499

Design 1, the fully crossed design (subjects by stimuli by treatments), has no missing data and hence has the highest power. Design 2 conf~~~~s stimulus with subject in that each subject receives a different stimulus. Design 3 has stimuli crossed with treatments, but subjects are neste within treatments. It has two potential advantages: First, subjects are not aware of the experimental variable, since it is not varied within subject. Second, this design is called for when the experimental treatment is difficult to change once it has been manipulated (e.g., subjects level of arousal). Design 4 crosses subjects with treatments but nests stimuli within treatments. Such a design is useful if stimuli cannot be crossed with treatments (e.g., stimulus person within a level of sex). Design 5 counterbalances the nesting of stimuli within treatments. Thus, unlike designs 3 and 4, the treatment effect here is not confounded with either subject or stimulus. Although design 1 is, in principle, preferable to the other designs, it may not be usable in a particular situation because, for substantive reasons, it may be impossible to present each stimulus to a subject more than once. One example is a person memory experiment, in which the presentatio of a stimulus person more than once to a subject would confound the memory-dependent variable. As a second example, consider a study person perception, where the stimuli are photographs of persons and t treatments are manipulations of the description of each person. Obviously one cannot expose any subject to the same stimulus photograph with two possibly conflicting descriptions, and this rules out design 1. Designs 2-5 meet this restriction, but they are not all equally advantageous. Design 5 provides the most efficient estimate of treatment effects because treatments are crossed with both subjects and stimu1i.l For these reasons design 5 is in relatively common use among social MeArthur (1972) used it to test Kelleys (1947) ve theory. However, we know of no text that discusses design, and that is the concern of this paper. CONSTRUCTING THE DES! We refer to design 5 as the counterbalanced design. One cara view it as either the counterbalancing across subjects of the stimuli within treatments, or the counterbalancing across stimuli of the subjects withes treatments. The construction of a counterbalancing scheme will be described below with several examples and some general principles. However, to begin with, in constructing a counterbalanced design, two choices
1 Crossing treatment by subject and stimulus results generally results in higher power to detect treatment effects since the interactions of treatment with subjects and stimuli (which will become the denominators for F-tests of treatment effects in this case) are generally smaller than the main effects of subjects or stimuli (which would be the F-test denominators if subjects or stimuli were nested within treatments, as in Designs 2, 3, and 4).

500

KENNY

AND SMITH

must be made. First, for each subject, how many stimuli should be in each treatment condition? Second, should stimulus be considered a fixed or random factor in the design? The first question refers to the number of replications (hence the power) and the second refers to the target of generalization. Most studies that employ the counterbalanced design have used only a single stimulus for each treatment level. Unfortunately, this strategy both gives low power and precludes treating stimulus as a random factor. If stimulus is treated as a fixed factor, then one can only generalize the results to the particular stimuli employed in the study (cf. Santa, Miller, & Shaw, 1979). Thus the results of a study of attitude change would be limited to the specific attitude topics that were chosen; the results of a person perception study to a particular set of photographs, etc. Because of this limitation, it has become traditional within cognitive psychology to treat stimulus as a random factor (Clark, 1973). With very few exceptions, however, social psychologists have chosen to treat stimuli as fixed. Perhaps the major reason for this is that treating the stimulus factor as random substantially reduces the probability of obtaining a significant effect. This reduced power is logically and inevitably associated with the increase in generalizability. However, it seems to us that social psychologists should be willing to pay the price of larger experiments and fewer significant effects to win increased generalizability. Once the decisions as to the number of stimuli per condition and the fixed or random nature of the stimulus factor have been made, one must construct a counterbalancing scheme. Table 2 provides the simplest
TABLE 2 Stimulus set (B) B, Stimuli Subject group (A) A, Subjects 1 2 T, n A, Subjects n+l n+2 T2 2n Note. Tj indicates level i of the experimental factor. Tl T, 12 . . . k k+l B, Stimuli k+2 . . . 2k
HYPOTHETICALEXAMPLEOFTHECOUNTERBALANCEDDESIGN

RECEIVING

STIMULI TABLE 3

ONLY

ONCE

EXPECTED

MEAN

SQUARES

FOR THE

DESIGN

IN TABLE

Source
A

df 1 2(n-1) 1 2(k - 1) 1 2(k- 1) 2(n - 1) 4(n-l)(k-1)

su
X X

St

TXSU

TxSt
Xb

SllXSt 9
Xb Xb X

WA B StiBd A x B (=T) A x StlBd B x SulA SuiA x StiBd

Xb

x
X X Xb

x
X

xb x
X0

-A

a Derived under the assumption that A and B are randomly formed groups of subjects and stimuli, respectively; hence u.?, = U$ = 0. Su=subjects and St=stimuli. An x in the table indicates that the source of variation at the head of the column is a component of the expected mean square of the effect shown in that row. b Equals zero if stimulus is a fixed factor. c The symbol / refers to nesting. That is, SulA is subjects nested within levels of A. d Not estimabie if only one stimulus per treatment level, k = 1.

example of the counterbalanced design, There is a dichotomous experimental factor, and the stimuli and subjects have been randomly subdivided into two sets. The subsets of subjects have been denoted as A, and A, and those of stimuli as B, and BZ. Subjects in Group Al receive level 1 on the experimental factor with stimuli in set B, and level 2 with B,; subjects in group AZ receive the opposite pattern (level 1 with Bz an level 2 with B,). ANALYZING THE DESIGN With the counterbalanced design, as with any ANQVA design, t construction of appropriate F or quasi-F ratios rests on the examination of the expected mean squares for the design. General rules for deriving expected mean squares can be found in many ANOVA texts. Table 3 indicates the components of the expected mean squares for the design in Table 2. Note that no unusual assumptions have been made in derivin these values, except that subjects and stimuli have been randomly assigned to subsets. There are two aspects of the expected mean squares in Table 3 to which we wish to direct the readers attention. First, the main effect of treatments is equivalent to the A x B interaction. Thus, any computer program which handles crossed and nested designs can analyze this particular design; one does not need a specialized program. As will seen below, the counterbalanced design can be constructed to have t convenient property whenever the treatment(s) have a P structure. Second, the treatment effect can be tested with MSB x Su/A2 as an
2 In this and the following formulas, Su represents the subject factor, St the stimuli, and the slash represents nesting. Thus, Su/A is subjects nested within level of factor A.

502 error term if stimulus stimulus is random:


F=

KENNY

AND SMITH

is fixed,

or by the following

quasi-F

ratios if

MS, MSB

x B + x SulA

M&U/A +MSA

x St/B x SUB

or
MS, F= MSB x SulA + MS, x B x St/B M&,,A x St/B .

A quasi-F ratio is approximately distributed as F (with adjusted degrees of freedom) and is often necessary to test a factor when the design contains two or more random factors (Clark, 1973). Some researchers have mistakenly analyzed the design in Table 2 by ignoring the stimulus factor, that is, summing or averaging over the stimuli within each treatment condition. If the counterbalancing scheme suggested in this paper is used, and if the stimulus factor is actually fixed rather than random,3 then the consequence of ignoring stimuli in the analysis is simply lower power. In other circumstances, and particularly when the stimulus factor is actually random, the consequences are much more severe. Santa et al. (1979) have noted that ignoring stimuli or treating stimuli as fixed when in fact it is random can have very undesirable consequences: an inflated value for F. The inflation can be so large that the actual probability of obtaining the observed F can be 40 to 50 times larger than the nominal alpha level, even with only 10 subjects. . . . Moreover, the inflation will be more severe as the number of subjects is increased (1979, p. 39). Thus, researchers should not ignore stimulus as a factor in the design, lest they obtain inappropriately significant Fs (commit Type I errors). Moreover, it is just as incorrect to take account of stimuli but ignore subjects in the analysis as McArthur (1972) apparently did.
THE GENERAL CASE

The basic design in Table 2 can be enlarged in three ways. There may be more than one treatment factor. There may also be factors that define subgroups of subjects (for example, a between-subjects experimental treatment or subject sex). Finally, the stimuli may be divided into groups (for instance, there may be person descriptions of different lengths, or photographs of males and females). The addition of such factors increases the complexity of the design, but the same strategy can be employed.
3 Stimuli are fixed if they are not chosen from a larger set of potential stimuli to which one wishes to generalize; instead, the small set of stimuli actually used in the experiment is the focus of interest.

RECEIVING

STIMULI

ONLY

ONCE

The basic strategy is to enumerate the number of treatment conditions to be measured for each subject. If there are 4 such conditions then kq stimuli are required for the experiment where k is a positive integer (ideally larger than one). The kq stimuli are divided randomly into q sets of k stimuli. One now chooses a q by q latin square for the design plan. For certain latin squares where q is a power of 2, one can create dummy variables for the stimulus set and subject factors such that their interaction tests a given treatment effect, rendering unnecessary a specialize computer program to analyze the latin square design This dummy vanable approach is illustrated in the next section, If the dummy variable approach cannot be employed and if a program to analyze a latin square is not available,* the following approach can be used. For simplicity assume the treatment has three levels, which woul require a 3 x 3 latin square. Let us denote the subject sets as A (with three levels) and the stimulus sets as B (also three levels). To perform a quasi-F test on the treatment effect T one needs four mean squares: T, T x %/A, T x St/B, and Su/A x St/B. Three different ANOVA runs on the da suffice to yield these mean squares. Treating the data as simply an A x design (ignoring T) yields the SuiA x St/B mean square. Treating the as T x A (ignoring B) yields the T and T x Su/A mean squares. Fi treating the data as T x B (ignoring A) yields T (again) and T x St/B. these four mean squares one can form either the simple P; test for T (treating stimuli as fixed), MST / MS T X suiA or the appropriate quasi-P; ratio (where stimuli are random). An Example An experiment designed and analyzed by this method is described in Smith and Miller (1979), and will be briefly discussed here as an example. The experiment was a replication of the study by McArthur (1972) on the effects of information on causal attributions, with the addition of response time as a new dependent variable. Following the theoretical model of Kelley (1967), McArthur varied three types of information (~ons~~sus~ distinctiveness, and consistency) pertaining to events presented as sentences, and obtained subjects attributions as to what caused the events. A sample sentence might be Sue is afraid of the dog. The three informational items were manipulated by additional sentences: high (1~~) consensus: Almost everybody (nobody) else is afraid of the dog; big (low) distinctiveness: Sue is not afraid of almost any other dog; (is afraid of almost every other dog); high (low) consistency: In the past Sue has almost always (never) been afraid of this dog. So a complete stimulus presentation to the subject in the high consensus/high distinc* The ANOVA procedure of the SAS statistical package is one widely available that will handle latin squares. program

504

KENNY

AND SMITH

tiveness/low consistency condition might be the following: Sue is afraid of the dog. Almost everyone else is afraid of the dog. Sue is not afraid of almost any other dog. In the past Sue has almost never been afraid of this dog. In this study, 32 sentences are the basic stimuli, while the three informational factors applied to the sentences are manipulated orthogonally. (There are thus 32 x 23 or 256 different stimuli in total.) There is also one grouping factor on the sentences: some involve verbs classified as manifest (i.e., overt actions) and others as latent (opinions and emotions, as in the example). There are no subgroups of subjects in this study. The experimenters wished to estimate the effects on attributions of causality for four factors (consensus, distinctiveness, consistency, and verb type) and all of their interactions across the hypothetical population of sentences from which the set of sentences in the study was drawn. The design plan is shown in Table 4. The 32 sentences were randomly separated into eight sets, each containing four sentences (two of each verb type). Also, the 24 subjects were randomly assigned to eight groups. Table 5 shows the experimental condition for sentences in each sentence set for each group of subjects. For instance, for subject group 3, sentence set 2, the stimuli are presented with low consensus, high distinctiveness, and high consistency. Note that the design plan is actually a latin square
TABLE 4 DESIGN PLAN FOR SMITH AND MILLER (1979) STUDY Subject group Sentence set 1 111 112 121 122 211 212 221 222 2 112 111 122 121 212 211 222 221 3 121 122 111 112 221 222 211 212 E: F: G: H: I: J: 4 122 121 112 111 222 221 212 211 5 211 212 221 222 111 112 121 122 6 212 211 222 221 112 111 122 121 7 221 222 211 212 121 122 111 112 8 222 221 212 211 122 121 112 111

Subject group 1, 2, 3, 4 vs 5, 6, 7, 8 1, 2, 5, 6 vs 3, 4, 7, 8 1, 3, 5, 7 vs 2, 4, 6, 8 Sentence set 1, 2, 3, 4 vs 5, 6, 7, 8 1, 2, 5, 6 vs 3, 4, 7, 8 1, 3, 5, 7 vs 2, 4, 6, 8

Note. The three numbers in each cell of the matrix designate the levels of consensus, distinctiveness, and consistency respectively (1 = low, 2 = high).

RECEIVING

STIMULI

ONLY

ONCE

since each cell of the 2 x 2 x 2 design for the experimental factors is in each row and each column. One should note also that it is actually a combination of four 4 x 4 latin squares and that within each of the 4 x 4 squares are four 2 x 2 squares. (See Appendix A for detailed instructions on the construction of the square.) This special latin square facilitates the estimation of the effects of experimental factors and their interactions. Any other latin square could be chosen, but then the dummy variable strategy described below could not be employed. For this counterbalanced design, the experimental factors are equivalent to the interaction of the stimulus sets by the subject groups. To aid in the estimation and testing of the factors, three dummy variables are created for subject group and three for stimulus set. Table 4 defines these dummy variables. They are assigned so that the interaction of the first subject group factor (E) and the first stimulus set factor (II) yields the consensus manipulation, the interaction of the second subject factor by the second stimulus factor yields distinctiveness, and the interaction of the third subject factor by the third stimulus factor yields consistency. The design, therefore, contains nine factors altogether: the three dummy subject factors (E,F,G), the three dummy stimulus factors ( J), verb type (V), subject (Su), and stimulus (St). Subjects are nest within cells of E x F x G, and stimulus within cells of V x II x I x Subjects and stimuli are crossed. Table 5 gives a translation for the effects of the three experimental factors; for example, the consistency main effect is the G x J interaction. In Smith and Millers study, neither the effect of subject group, F(7, 16) = .68, nor that of stimulus set, F(7, 16) = 36, was reliable. Thus, there was no evidence of a violation of t models assumptions of random assignment of subjects and stimuli to groups.
SUMMARY

These proposed approaches to the design and analysis problems faced by researchers in this type of situation are not complete. Problems remain
TABLE
TRANSLATION Term SCHEME

5
DESIGN IN TABLE 4

FOR THE

Represents

factor
(D) (Cy)

ExH
FXI GxJ EXFXI-IXI ExGxFIxJ FxGxIxJ

Consensus (Cs)
Distinctiveness Consistency Cs x D cs x cy D x cy

ExFxGxHxIxJ

Cs x D x Cy

506

KENNY

AND

SMITH

in the areas of treatment of missing data and of order effects (Smith and Miller randomized the order of the 32 stimuli separately for each subject, but this is not often feasible). However, this approach to the construction and analysis of the counterbalanced design should help researchers avoid many of the problems that have plagued them in the past, notably the mistakes of ignoring stimuli (or subjects) in the analysis and of treating stimuli as fixed when in fact it is random. While many of the advantages of the counterbalanced design have been appreciated by social psychologists in the past (as evidenced by its frequent appearance in the literature), its analysis has rarely (if ever) been conducted correctly. With the provision of appropriate analyses, the advantages of this type of design now seem stronger than ever.
APPENDIX A

After dividing the subjects and stimuli randomly into eight sets each, this latin square was constructed by treating each of the three manipulated factors separately, as follows. The first factor, consensus, has its two levels assigned to subject-stimulus combinations in accordance with Table 2; that is, the first half of the subjects (sets l-4) receive level one of the factor with the first half of the stimuli (sets l-4), and so on. The first subject factor(E) and the first stimulus factor(H) define the division of the subjects and stimuli into halves; their interaction defines the consensus factor. Now, to place the levels of the second factor, one simply focuses on one-quarter of the overall design matrix and repeats the procedure above. Considering only subject sets l-4 and stimulus sets 1-4, for the moment: assign level one of the second factor (distinctiveness) to subject sets 1-2 with stimulus sets l-2 and also to subject sets 3-4 with stimulus sets 3-4; level 2 of distinctiveness goes to the other combinations. (This is in effect a quarter-size replica of Table 2 fitted into the first quarter of the overall design matrix.) Duplicate this pattern in the other three-quarters to obtain the complete assignment of levels of the distinctiveness factor and note that the second subject factor (F) and the second stimulus factor (I) define distinctiveness by their interaction when one is finished. The third factor is assigned similarly. Focus on the first 1/16th of the design matrix, subject sets 1-2 and stimulus sets l-2. A 1/16th size replica of Table 2 again fits here, so that the third factor (consistency) has its first level assigned to subject set 1 with stimulus set 1 and also to subject set 2 with stimulus set 2; its second level goes to the other combinations. Duplicate this pattern in the other 15/16ths of the design matrix, and assign the third subject and stimulus factors G and J as the contrast of odd vs even-numbered subject and stimulus sets. Their interaction will now define the third factor, consistency.

RECEIVING

STIMULI

ONLY

ONCE

REFERENCES
Clark, H. H. The language-as-fixed-effect fallacy: A critique of-language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 335-339. Kelley, H. H. Attribution theory in social psychology. In D. Levine (Ed.), Nebraska symposium on motivation (Vol. 15). Lincoln: Univ. of Nebraska Press, 1967. McArthur, L. A. The how and what of why: Some determinants and consequences of causal attributions. Journal of Personality and Social Psychology, 1972, 22, 171-193. Santa, J. L., Miller, J. J., & Shaw, M. L. Using quasi F to prevent alpha inflation due to stimulus variation. Psychological Bulletin, 1979, 86, 37-46. Smith, E. R., & Miller, F. D. Attributional information processing: A reaction time model of causal subtraction. Journal of Personality and Social Psychology, 1979, 37, 17231731.

Anda mungkin juga menyukai