Anda di halaman 1dari 16

29

Analysis of Variance
CONTENTS 29.1 Conceptual Background 29.1.1 Clinical Illustration 29.1.2 Analytic Principles 29.2 Fishers F Ratio 29.3 Analysis-of-Variance Table 29.4 Problems in Performance 29.5 Problems of Interpretation 29.5.1 Quantitative Distinctions 29.5.2 Stochastic Nonsignificance 29.5.3 Stochastic Significance 29.5.4 Substantive Decisions 29.6 Additional Applications of ANOVA 29.6.1 Multi-Factor Arrangements 29.6.2 Nested Analyses 29.6.3 Analysis of Covariance 29.6.4 Repeated-Measures Arrangements 29.7 Non-Parametric Methods of Analysis 29.8 Problems in Analysis of Trends 29.9 Use of ANOVA in Published Literature References
The targeted analytic method called analysis of variance, sometimes cited acronymically as ANOVA, was devised (like so many other procedures in statistics) by Sir Ronald A. Fisher. Although often marking the conceptual boundary between elementary and advanced statistics, or between amateur fan and professional connoisseur, ANOVA is sometimes regarded and taught as elementary enough to be used for deriving subsequent simple procedures, such as the t test. Nevertheless, ANOVA is used much less often today than formerly, for reasons to be noted in the discussions that follow.

29.1 Conceptual Background


The main distinguishing feature of ANOVA is that the independent variable contains polytomous categories, which are analyzed simultaneously in relation to a dimensional or ordinal dependent (outcome) variable. Suppose treatments A, B, and C are tested for effects on blood pressure in a randomized trial. When the results are examined, we want to determine whether one of the treatments differs significantly from the others. With the statistical methods available thus far, the only way to answer this question would be to do multiple comparisons for pairs of groups, contrasting results in group A vs. B, A vs. C, and B vs. C. If more ambitious, we could compare A vs. the combined results of B and C, or group B vs. the combined results of A and C, and so on. We could work out various other arrangements, but in each

2002 by Chapman & Hall/CRC

instance, the comparison would rely on contrasting two collected groups, because we currently know no other strategy. The analysis of variance allows a single simultaneous comparison for three or more groups. The result becomes a type of screening test that indicates whether at least one group differs significantly from the others, but further examination is needed to find the distinctive group(s). Despite this disadvantage, ANOVA has been a widely used procedure, particularly by professional statisticians, who often like to apply it even when simpler tactics are available. For example, when data are compared for only two groups, a t test or Z test is simpler, and, as noted later, produces exactly the same results as ANOVA. Nevertheless, many persons will do the two-group comparison (and report the results) with an analysis of variance.

29.1.1

Clinical Illustration

Although applicable in experimental trials, ANOVA has been most often used for observational studies. A real-world example, shown in Figure 29.1, contains data for the survival times, in months, of a random sample of 60 patients with lung cancer,1,2 having one of the four histologic categories of WELL (welldifferentiated), SMALL (small cell), ANAP (anaplastic), and CYTOL (cytology only). The other variable (the five categories of TNM stage) listed in Figure 29.1 will be considered later. The main analytic question now is whether histology in any of these groups has significantly different effects on surviva l.

Direct Examination The best thing to do with these data, before any formal statis tical analyses begin, is to examine the results directly. In this instance, we can readily determine the group sizes, means, and standard deviations for each of the four histologic categories and for the total. The results, shown in Table 29.1, immediately suggest that the data do not have Gaussian distributions, because the standard deviations are almost all larger than the means. Nevertheless, to allow the illustration to proceed, the results can be further appraised. They show that the well-differentiated and small-cell groups, as expected clinically, have the highest and lowest mean survival times, respectively. Because of relatively small group sizes and non-Gaussian distributions, however, the distinctions may not be stochastically significant.
TABLE 29.1
Summary of Survival Times in Four Histologic Groups of Patients with Lung Cancer in Figure 29.1
Histologic Category WELL SMALL ANAP CYTOL Total Group Size 22 11 18 9 60 Mean Survival 24.43 4.45 10.87 11.54 14.77 Standard Deviation 26.56 3.77 23.39 13.47 22.29

29.1.1.1

Again before applying any advanced statistics, we can check these results stochastically by using simple t tests. For the most obvious comparison of WELL vs. SMALL, we can use the components of Formula [13.7] to calculate sp = [ 21 ( 26.56 )2 + 10 ( 3.77 ) 2 ] / ( 21 + 10 ) = 21.96; (1/nA) + (1/nB) = (1/22) + (1/11) = .369; and X A X B = 24.43 4.45 = 19.98. These data could then be entered into Formula [13.7] to produce t = 9.98/[(21.96)(.369)] = 2.47. At 31 d.f., the associated 2P value is about .02. From this distinction, we might also expect that all the other paired comparisons will not be stochastically significant. (If you check the calculations, you will find that the appropriate 2P values are all >.05.)

29.1.1.2

Holistic and Multiple-Comparison Problems The foregoing comparison indicates a significant difference in mean survival between the WELL and SMALL groups, but does not answer the holistically phrased analytic question, which asked whether histology has significant effects in any of the four groups in the entire collection. Besides, an argument could be made, using

2002 by Chapman & Hall/CRC

distinctions discussed in Section 25.2.1.1, that the contrast of WELL vs. SMALL was only one of the six (4 3/2) possible paired comparisons for the four histologic categories. With the Bonferroni correction, the working level of for each of the six comparisons would be .05/6 = .008. With the latter criterion, the 2P value of about .02 for WELL vs. SMALL would no longer be stochastically significant . We therefore need a new method to answer the original question. Instead of examining six pairs of contrasted means, we can use a holistic approach by finding the grand mean of the data, determining the deviations of each group of data from that mean, and analyzing those deviations appropriately.
OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ID 62 107 110 157 163 246 271 282 302 337 344 352 371 387 428 466 513 548 581 605 609 628 671 764 784 804 806 815 852 855 891 892 931 998 1039 1044 1054 1057 1155 1192 1223 1228 1303 1309 1317 1355 1361 1380 1405 1444 1509 1515 1521 1556 1567 1608 1612 1666 1702 1738 HISTOL WELL WELL WELL WELL WELL SMALL WELL ANAP WELL CYTOL WELL ANAP WELL SMALL SMALL ANAP ANAP ANAP ANAP CYTOL CYTOL SMALL SMALL SMALL ANAP WELL ANAP WELL WELL WELL CYTOL WELL CYTOL WELL SMALL ANAP WELL ANAP ANAP SMALL ANAP ANAP ANAP ANAP ANAP CYTOL WELL CYTOL SMALL WELL SMALL WELL ANAP ANAP SMALL CYTOL WELL CYTOL WELL WELL TNMSTAGE I II IIIA I I I IIIB IIIA I I II IIIA IIIB IIIA II IIIB I IV IV IV IV IV IV IV IV I IIIB I IIIB II IIIB IIIB IIIB IIIB IV II IIIB I I IV IV II IIIB II IV IIIB IV IV IV II IV I IV IIIB IV I IIIA IV II II SURVIVE 82.3 5.3 29.6 20.3 54.9 10.3 1.6 7.6 28.0 12.8 4.0 1.3 14.1 0.2 6.8 1.4 0.1 1.8 6.0 1.0 6.2 4.4 5.5 0.3 1.6 12.2 6.5 39.9 4.5 1.6 8.1 62.0 8.8 0.2 0.6 19.3 0.6 10.9 0.2 11.2 0.9 27.9 2.9 99.9 4.7 1.8 1.0 10.6 3.7 55.9 3.4 79.7 1.9 0.8 2.5 8.6 13.3 46.0 23.9 2.6

FIGURE 29.1 Printout of data on histologic type, TNM Stage, and months of survival in a random sample of 60 patients with primary cancer of the lung. [OBS = observation number in sample; ID = original indentification number; HISTOL = histology type; TNMSTAGE = one of five ordinal anatomic TNM stages for lung cancer; SURVIVE = survival time (mos.); WELL = well-differentiated; SMALL = small cell; ANAP = anaplastic; CYTOL = cytology only.]

2002 by Chapman & Hall/CRC

Many different symbols have been used to indicate the entities that are involved. In the illustration u p j. For example, if WELL here, Yij will represent the target variable (survival time) for person i in gro is the first group in Figure 29.1, the eighth person in the group has Y8,1 = 4.0. The mean of the values in group j will be Y j = Yij/nj, where nj is the number of members in the group. Thus, for the last group (cytology) in Table 29.1, n4 = 9, Yi,4 = 103.9, and Y 4 = 103.9/9 = 11.54. The grand mean, G, will be (nj Y j )/N, where N = nj = size of the total group under analysis. From the data in Table 29.1, G = [(22 24.43) + (11 4.45) + (18 10.87) + (9 11.54)]/60 = 885.93/60 = 14.77. We can now determine the distance, Y j G , between each groups mean and the grand mean. For the ANAP group, the distance is 10.87 14.77 = 3.90. For the other three groups, the distances are 3.23 for CYTOL, 10.32 for SMALL, and +9.66 for WELL. This inspection confirms that the means of the SMALL and WELL groups are most different from the grand mean, but the results contain no attention to stochastic variation in the data.

29.1.2

Analytic Principles

To solve the stochastic challenge, we can use ANOVA, which like many other classical statistical strategies, expresses real world phenomena with mathematical models. We have already used such models both implicitly and explicitly. In univariate statistics, the mean, Y , was an implicit model for fitting a group of data from only the values in the single set of data. The measured deviations from that model, 2 Yi Y, were then converted to the groups basic variance, ( Y i Y ) . In bivariate statistics for the associations in Chapters 18 and 19, we used an explicit model based on = a + bX . We then compared variances for three an additional variable, expressed algebraically as Y i i sets of deviations: Yi Y i , between the items of data and the explicit model; Yi Y , between the items of data and the implicit model; and Y i Y , between the explicit and implicit models. The group variances )2 , or sums of squares associated with these deviations were called residual (or error) for ( Yi Y i 2 2 basic for ( Y i Y ) , and model for ( Y i Y ) .

Distinctions in Nomenclature The foregoing symbols and nomenclature have been simplified for the sake of clarity. In strict statistical reasoning, any set of observed data is regarded as a sample from an unobserved population whose parameters are being estimated from the data. If modeled with a straight line, the parametric population would be cited as Y = + X. When the = a + bX , the coefficients a and b are estimates of results for the observed data are expressed as Y i i the corresponding and parameters. Also in strict reasoning, variance is an attribute of the parametric population. Terms such as 2 Y ) 2 , which are used to estimate the parametric variances, should be called sums ( Yi Y ) or ( Y i of squares, not group variances. The linguistic propriety has been violated here for two reasons: (1) the distinctions are more easily understood when called variance, and (2) the violations constantly appear in both published literature and computer print-outs. The usage here, although a departure from strict formalism, is probably better than in many discussions elsewhere where the sums of squares are called variances instead of group variances. Another issue in nomenclature is syntactical rather than mathematical. In most English prose, between is used for a distinction of two objects, and among for more than two. Nevertheless, in the original description of the analysis of variance, R. A. Fisher used the preposition between rather than among when more than two groups or classes were involved. The term between groups has been perpetuated by subsequent writers, much to the delight of English-prose pedants who may denounce the absence of literacy in mathematical technocracy. Nevertheless, Fisher and his successors have been quite correct in maintaining between. Its use for the cited purpose is approved by diverse high-echelon authorities, including the Oxford English Dictionary, which states that between has been, from its earliest appearance, extended to more than two.3 [As one of the potential pedants, I was ready to use among in this text until I checked the dictionary and became enlightened.] 29.1.2.2 Partition of Group Variance The same type of partitioning that was used for group variance in linear regression is also applied in ANOVA. Conceptually, however, the models are
2002 by Chapman & Hall/CRC

29.1.2.1

expressed differently. Symbolically, each observation can be labelled Yij, with j representing the group and i, the person (or other observed entity) within the group. The grand mean, G , is used for the implicit 2 model when the basic group or system variance, ( Y i G ) , is summed for the individual values of Yi in all of the groups. The individual group means, Y j , become the explicit models when the total system is partitioned into groups. The residual group variance is the sum of the values of (Yi Y j )2 within each of the groups. [In more accurate symbolism, the two cited group variances would be written with double subscripts and summations as (Yij G )2 and (Yij Y j )2.] The model group variance, summed for each group of nj members with group mean Y j , is n j ( Y j G )2. These results for data in the four groups of Figure 29.1 and Table 29.1 are shown in Table 29.2.
TABLE 29.2
Group-Variance Partitions of Sums of Squares for the Four Histologic Groups in Figure 29.1 and Table 29.1
Group WELL SMALL ANAP CYTOL Total Basic (Total System) 16866.67 1313.52 9576.88 1546.32 29304.61* Model (Between Groups) 22(24.43 14.77) 2 11(4.45 14.77)2 18(10.87 14.77) 2 9(11.54 14.77) 2 = = = = 2052.94 1171.53 273.78 93.90 3593.38* Residual (Within Groups) 14813.73 141.99 9303.10 1452.42 25711.24

* These are the correct totals. They differ slightly from the sum of the collection of individual values, calculated with rounding, in each column.

Except for minor differences due to rounding, the components of Table 29.2 have the same structure noted earlier for simple linear regression in Section 19.2.2. The structure is as follows: {Basic Group Variance} = {Model Variance between Groups} + { Residual Variance within Groups} or Syy = SM + SR. The structure is similar to that of the deviations Total Deviation = Model Deviation + Residual Deviation which arises when each individual deviation is expressed in the algebraic identity Y ij G = ( Y j G ) + ( Y ij Y j ) If G is moved to the first part of the right side, the equation becomes Y ij = G + ( Y j G ) + ( Y ij Y j ) and is consistent with a parametric algebraic model that has the form Yij = + j + ij In this model, each persons value of Yij consists of three contributions: (1) from the grand parametric mean, (which is estimated by G ); (2) from the parametric increment, j (estimated by Y j G ), between the grand mean and group mean; and (3) from an error term, ij (estimated by Yij Y j ), for the increment between the observed value of Yij and the group mean. For stochastic appraisal of results, the null hypothesis assumption is that the m groups have the same parametric mean, i.e., 1 = 2 = = j = = m.
2002 by Chapman & Hall/CRC

29.1.2.3 Mean Variances and Degrees of Freedom When divided by the associated degrees of freedom, each of the foregoing group variances is converted to a mean value. For the basic group variance, the total system contains N = nj members, and d.f. = N 1. For the model variance, the m groups have m 1 degrees of freedom. For the residual variance, each group has nj 1 degrees of freedom, and the total d.f. for m groups is (nj 1) = N m. The degrees of freedom are thus partitioned, like the group variances, into an expression that indicates their sum as
N 1 = (m 1) + (N m) The mean variances, however, no longer form an equal partition. Their symbols, and the associated values in the example here, are as follows: Mean Group Variance = Syy /(N 1) = 29304.61/59 = 496.69 Mean Model Variance = SM /(m 1) = 3593.38/3 = 1197.79 (between groups) Mean Residual Variance = SR/(N m) = 25711.24/56 = 459.13 (within groups)

29.2 Fishers F Ratio


Under the null hypothesis of no real difference between the groupsi.e., the assumption that they have the same parametric meaneach of the foregoing three mean variances can be regarded as a separate estimate of the true parametric variance. Within the limits of stochastic variation in random sampling, the three mean variances should equal one another. To test stochastic significance, R. A. Fisher constructed a variance ratio, later designated as F, that is expressed as Mean variance between groups -------------------------------------------------------------------------Mean variance within groups It can be cited symbolically as SM / ( m 1 ) F = --------------------------S R/ ( N m ) [29.1]

If only two groups are being compared, some simple algebra will show that Formula [29.1] becomes the square of the earlier Formula [13.7] for the calculation of t (or Z). This distinction is the reason why the F ratio is sometimes used, instead of t (or Z), for contrasting two groups, as noted earlier in Section 13.3.6. The Fisher ratio has a sampling distribution in which the associated 2P value is found for the values of F at the two sets of degrees of freedom in values of m 1 and N m. The three components make the distribution difficult to tabulate completely; and it is usually cited according to values for F for each degree of freedom simultaneously at fixed values of 2P such as .1, .05, .01. In the example under discussion here, the F ratio is 1197.79/459.13 = 2.61. In the Geigy tables4 available for the combination of 3 and 56 degrees of freedom, the required F values are 2.184 for 2 P = .1, 2.769 for 2P = .05, and 3.359 for 2P = .025. If only the Geigy values were available, the result would be written as .05 < 2P < .1. In an appropriate computer program, however, the actual 2P value is usually calculated and displayed directly. In this instance, it was .0605. If 2P is small enough to lead to rejection of the null hypothesis, the stochastic conclusion is that at least one of the groups has a mean significantly different from the others. Because the counter-hypothesis for the F test is always that the mean variance is larger between groups than within them, the null hypothesis can promptly be conceded if the F ratio is < 1. In this instance, because the null hypothesis cannot be rejected at = .05, we cannot conclude that a significant difference in survival has been

2002 by Chapman & Hall/CRC

stochastically confirmed for the histologic categories. The observed quantitative distinctions seem impressive, however, and would probably attain stochastic significance if the group sizes were larger.

29.3 Analysis-of-Variance Table


The results of an analysis of variance are commonly presented, in both published literature and computer printouts, with a tabular arrangement that warrants special attention because it is used not only for ANOVA but also for multivariable regression procedures that involve partitioning the sums of squared deviations (SS) that form group variances. In each situation, the results show the partition for the sums of squares of three entities: (1) the total SS before imposition of an explicit model, (2) the SS between the explicit model and the original implicit grand mean, and (3) the residual SS for the explicit model. The last of these entities is often called the unexplained or error variance. Both of these terms are unfortunate because the mathematical explanation is a statistical phenomenon that may have nothing to do with biologic mechanisms of explanation and the error represents deviations between observed and estimated values, not mistakes or inaccuracies in the basic data. In certain special arrangements, to be discussed shortly, the deviations receive an additionally improved explanation when the model is enhanced with subdivisions of the main variable or with the incorporation of additional variables. Figure 29.2 shows the conventional headings for the ANOVA table of the histology example in Figure 29.1. For this one-way analysis, the total results are divided into two rows of components. The number of rows is appropriately expanded when more subgroups are formed (as discussed later) via such mechanisms as subdivisions or inclusion of additional variables.
Dependent Variable: SURVIVE Source Model Error Corrected Total DF 3 56 59 R-Square 0.122622 Sum of Squares 3593.3800000 25711.2333333 29304.6133333 C.V. 145.1059 Root MSE 21.427300 SURVIVE Mean 14.766667 Mean Square 1197.7933333 459.1291667 F Value 2.61 Pr > F 0.0605

FIGURE 29.2 Printout of analysis-of-variance table for survival time in the four histologic groups of Figure 29.1.

29.4 Problems in Performance


The mathematical reasoning used in many ANOVA arrangements was developed for an ideal experimental world in which all the compared groups or subgroups had the same size. If four groups were being compared, each group had the same number of members, so that n1 = n2 = n3 = n4. If the groups were further divided into subgroupssuch as men and women or young, middle-aged , and oldthe subgroups had the same sizes within each group. These equi-sized arrangements were easily attained for experiments in the world of agriculture, where R. A. Fisher worked and developed his ideas about ANOVA. Equally sized groups and subgroups are seldom achieved, however, in the realities of clinical and epidemiologic research. The absence of equal sizes may then create a major problem in the operation of computer programs that rely on equal sizes, and that may be unable to manage data for other circumstances. For the latter situations, the computer programs may divert ANOVA into the format of a general linear model, which is essentially a method of multiple regression. One main reason, therefore, why regression methods are replacing ANOVA

2002 by Chapman & Hall/CRC

methods today is that the automated regression methods can more easily process data for unequal-sized groups and subgroups.

29.5 Problems of Interpretation


The results of an analysis of variance are often difficult to interpret for both quantitative and stochastic reasons, as well as for substantive decisions.

29.5.1

Quantitative Distinctions

The results of ANOVA are almost always cited with F ratios and P values that indicate stochastic accomplishments but not quantitative descriptive distinctions. The reader is thus left without a mechanism to decide what has been accomplished quantitatively, while worrying that significant P values may arise mainly from large group sizes. Although not commonly used, a simple statistical index can provide a quantitative description of the results. The index, called eta squared , was previously discussed in Section 27.2.2 as a counterpart of r2 for proportionate reduction of group variance in linear regression. Labeled R-square in the printout of Figure 29.2, the expression is SM 2 (between-group) variance ------------------------------------------------------------------------------ = ----- = Model Syy Total system (basic) variance For the histologic data in Figure 29.2, this index is 3593.38/29304.61 = 0.12, representing a modest achievement, which barely exceeds the 10% noted earlier (see Section 19.3.3) as a minimum level for quantitative significance in variance reduction.

29.5.2

Stochastic Nonsignificance

Another important issue is what to do when a result is not stochastically significant, i.e., P > . In previous analytic methods, a confidence interval could be calculated around the nonsignificant increment, ratio, or coefficient that described the observed d O distinction in the results. If the upper end of this confidence interval excluded a quantitatively significant value (such as ), the result could be called stochastically nonsignificant . If the confidence interval included , the investigator might be reluctant to concede the null hypothesis of no difference. This type of reasoning would be equally pertinent for ANOVA, but is rarely used because the results seldom receive a descriptive citation. Confidence intervals, although sometimes calculated for the mean of each group, are almost never determined to give the value of eta the same type of upper and lower confidence boundaries that can be calculated around a correlation coefficient in simple linear regres s ion. In the absence of a confidence interval for eta, the main available descriptive approach is to examine results in individual groups or in paired comparisons. If any of the results seem quantitatively significant, the investigator, although still conceding the null hypothesis (because P > ), can remain suspicious that a significant difference exists, but has not been confirmed stochastically. For example, in Figure 29.2, the P value of 0.06 would not allow rejection of the null hypothesis that all group means are equal. Nevertheless, the modestly impressive value of 0.12 for eta squared and the large increment noted earlier between the WELL and SMALL group means suggest that the group sizes were too small for stochastic confirmation of what is probably a quantitatively significant distinction.

29.5.3

Stochastic Significance

If P < , the analysis has identified something that is stochastically significant, and the next step is to find where it is located. As noted earlier, the search involves a series of paired comparisons. A system

2002 by Chapman & Hall/CRC

containing m groups will allow m(m 1)/2 paired comparisons when each groups mean is contrasted against the mean of every other group. With m additional paired comparisons between each group and the total of the others, the total number of paired comparisons will be m(m + 1)/2. For example, the small-cell histologic group in Table 29.1 could be compared against each of the three other groups and also against their total. A particularly ingenious (or desperate) investigator might compare a single group or paired groups against pairs (or yet other combinations) of the others. This plethora of activities produces the multiple comparison problem discussed in Chapter 25, as well as the multiple eponymous and striking titles (such as Tukeys honestly significant difference5) that have been given to the procedures proposed for examining and solving the problem.

29.5.4

Substantive Decisions

Because the foregoing solutions all depend on arbitrary mathematical mechanisms, investigators who are familiar with the substantive content of the data usually prefer to avoid the polytomous structure of the analysis of variance. For example, a knowledgeable investigator might want to compare only the SMALL vs. WELL groups with a direct 2-group contrast (such as a t test) in the histologic data, avoiding the entire ANOVA process. An even more knowledgeable investigator, recognizing that survival can be affected by many factors (such as TNM stage and age) other than histologic category, might not want to do any type of histologic appraisal unless the other cogent variables have been suitably accounted for. For all these reasons, ANOVA is a magnificent method of analyzing data if you are unfamiliar with what the data really mean or represent. If you know the substantive content of the research, however, and if you have specific ideas to be examined, you may want to use a simpler and more direct way of examining them.

29.6 Additional Applications of ANOVA


From a series of mathematical models and diverse arrangements, the analysis of variance has a versatility, analogous to that discussed earlier for chi square, that for many years made ANOVA the most commonly used statistical procedure for analyzing complex data. In recent years, however, the ubiquitous availability of computers has led to the frequent replacement of ANOVA by multiple regression procedures, whose results are often easier to understand. Besides, ANOVA can mathematically be regarded as a subdivision of the general-linear-model strategies used in multivariable regression analysis. Accordingly, four of the many other applications of ANOVA are outlined here only briefly, mainly so that you will have heard of them in case you meet them (particularly in older literature). Details can be found in many statistical textbooks. The four procedures to be discussed are multi-factor arrangements, nested analyses, the analysis of covariance (ANCOVA), and repeated-measures arrange ments (including the intraclass correlation coefficient).

29.6.1

Multi-Factor Arrangements

The procedures discussed so far are called one-way analyses of variance, because only a single independent variable (i.e., histologic category) was examined in relation to survival time. In many circumstances, however, two or more independent variables can be regarded as factors affecting the dependent variable. When these additional factors are included, the analysis is called two-way (or two-factor), threeway (or three-factor), etc. For example, if the two factors of histologic category and TNM stage are considered simultaneously, the data for the 60 patients in Figure 29.1 would be arranged as shown in Table 29.3. The identification of individual survival times would require triple subscripts: i for the person, j for the row, and k for the column.

2002 by Chapman & Hall/CRC

TABLE 29.3
Two-Way Arrangement of Individual Data for Survival Time (in Months) of Patients with Lung Cancer
Histologic Category Well I 82.3 20.3 54.9 28.0 12.2 39.9 79.7 10.3 II 5.3 4.0 1.6 55.9 23.9 2.6 TNM Stage IIIA IIIB 29.6 13.3 1.6 14.1 4.5 62.0 0.2 0.6 IV 1.0 Mean for Total Row Category

24.43 4.45 10.87 11.54


14.77

Small

6.8

0.2

4.4 5.5 0.3 0.6 11.2 3.7 3.4 2.5 1.8 6.0 1.6 0.9 4.7 1.9 1.0 6.2 10.6 46.0

Anap

0.1 10.9 0.2

19.3 27.9 99.9

7.6 1.3

1.4 6.5 2.9 0.8

Cytol

12.8 8.6

8.1 8.8 1.8

Mean for Total Column Category

27.7

24.72

10.40

8.72

5.96

29.6.1.1 Main Effects In the mathematical model of the two-way arrangement, the categorical mean for each factorHistology and TNM Stagemakes a separate contribution, called the main effect, beyond the grand mean. The remainder (or unexplained) deviation for each person is called the residual error. Thus, a two-factor model for the two independent variables would express the observed results as
Y ijk = G + ( Y j G ) + ( Y k G ) + ( Y ijk Yj Y k + G ) [29.2]

The G term here represents the grand mean. The next two terms represent the respective deviations of each row mean ( Y j ) and each column mean ( Y k ) from the grand mean. The four components in the last term for the residual deviation of each person are constructed as residuals that maintain the algebraic identity. The total sum of squares in the system will be (Yijk G )2, with N 1 degrees of freedom. There will be two sums of squares for the model, cited as nj( Y j G )2 for the row factor, and as nk( Y k G )2 for the column factor. The residual sum of squares will be the sum of all the values of (Yijk Y j Y k + G )2. Figure 29.3 shows the printout of pertinent calculations for the data in Table 29.3. In the lower half of Figure 29.3, the 4-category histologic variable has 3 degrees of freedom and its Type I SS (sum of squares) and mean square, respectively, are the same 3593.38 and 1197.79 shown earlier. The 5-category TNM-stage variable has 4 degrees of freedom and corresponding values of 3116.39 and 779.10. The residual error group variance in the upper part of the table is now calculated differentlyas the corrected

2002 by Chapman & Hall/CRC

Dependent Variable: SURVIVE Source Model Error Corrected Total DF 7 52 59 R-Square 0.228966 Source HISTOL TNMSTAGE DF 3 4 Sum of Squares 6709.7729638 22594.8403695 29304.6133333 C.V. 141.1629 Type I SS 3593.3800000 3116.3929638 Mean Square 958.5389948 434.5161610 F Value 2.21 Pr > F 0.0486

Root MSE 20.845051 Mean Square 1197.7933333 779.0982410

SURVIVE Mean 14.766667 F Value 2.76 1.79 Pr > F 0.0515 0.1443

FIGURE 29.3 Printout for 2-way ANOVA of data in Figure 29.1 and Table 29.3.

total sum of squares minus the sum of Type I squares, which is a total of 6709.77 for the two factors in the model. Since those two factors have 7 (=3 + 4) degrees of freedom, the mean square for the model is 6709.77/7 = 958.54, and the d.f. in the error variance is 59 7 = 52. The mean square for the error variance becomes 22594.84/52 = 434.52. When calculated for this two-factor model, the F ratio of mean squares is 2.21, which now achieves a P value (marked Pr > F) just below .05. If the level is set at .05, this result is significant, whereas it was not so in the previous analysis for histology alone. The label Type I SS is used because ANOVA calculations can also produce three other types of sums of squares (marked II, III, and IV when presented) that vary with the order in which factors are entered or removed in a model, and with consideration of the interactions discussed in the next section. As shown in the lower section of Figure 29.3, an F-ratio value can be calculated for each factor when its mean square is divided by the error mean square. For histology, this ratio is 1197.79/434.52 = 2.76. For TNM stage, the corresponding value in the printout is 1.79. The corresponding 2P values are just above .05 for histology and .14 for TNM stage.

Interactions In linear models, each factor is assumed to have its own separate additive effect. In biologic reality, however, the conjunction of two factors may have an antagonistic or synergistic effect beyond their individual actions, so that the whole differs from the sum of the parts. For example, increasing weight and increasing blood pressure may each lead to increasing mortality, but their combined effect may be particularly pronounced in persons who are at the extremes of obesity and hypertension. Statisticians use the term interactions for these conjunctive effects; and the potential for interactions is often considered whenever an analysis contains two or more factors. To examine these effects in a two-factor analysis, the model for Yijk is expanded to contain an interaction term. It is calculated, for the mean of each cell of the conjoined categories, as the deviation from the product of mean values of the pertinent row and column variables for each cell. In the expression of the equation for Yijk , the first three terms of Equation [29.2] are the same: G , for the grand mean; Y j G for each row; and Y k G for each column. Because the observed mean in each cell will be Yjk , the interaction effect will be the deviation estimated as Yjk Y j Y k + G . The remaining residual effect, used for calculating the residual sum of squares, is Yijk Y jk . For each sum of squares, the degrees of freedom are determined appropriately for the calculations of mean squares and F ratios. The calculation of interaction effects can be illustrated with an example from the data of Table 29.3 for the 7-member cell in the first row, first column. The grand mean is 14.77; the entire WELL histologic category has a mean of 24.43; and TNM stage I has a mean of 27.71. The mean of the seven values in the cited cell is (82.3 + 20.3 + + 79.7)/7 = 45.33. According to the algebraic equation, G = 14.77; in the first row, ( Y j G ) = 24.43 14.77 = 9.66; and in the first column, ( Y k G ) = 27.71 14.77 = 12.94. The interaction effect in the cited cell will be estimated as 45.33 24.43 27.71 + 14.77 = 7.96. The estimated value of the residual for each of the seven Yijk values in the cited cell will be Yijk 7.96.

29.6.1.2

2002 by Chapman & Hall/CRC

Figure 29.4 shows the printout of the ANOVA table when an interaction model is used for the twofactor data in Table 29.3. In Figure 29.4, the sums of squares (marked Type I SS) and mean squares for histology and TNM stage are the same as in Table 29.3, and they also have the same degrees of freedom. The degrees of freedom for the interaction are tricky to calculate, however. In this instance, because some of the cells of Table 29.3 are empty or have only 1 member, we first calculate degrees for freedom for the residual sum of squares, (Yijk Y jk )2. In each pertinent cell, located at (j, k) coordinates in the table, the degrees of freedom will be njk 1. Working across and then downward through the cells in Table 29.3, the sum of the njk 1 values will be 6 + 5 + 1 + 5 + 7 + 2 + 2 + 1 + 3 + 5 + 1 + 2 + 3 = 43. (The values are 0 for the four cells with one member each and also for the 3 cells with no members.) This calculation shows that the model accounts for 59 43 = 16 d.f.; and as the two main factors have a total of 7 d.f., the interaction factor contributes 9 d.f. to the model, as shown in the last row of Figure 29.4.
Source Model Error Corrected Total DF 16 43 59 R-Square 0.472126 Source HISTOL TNMSTAGE HISTOL*TNMSTAGE DF 3 4 9 Sum of Squares 13835.482381 15469.130952 29304.613333 C.V. 128.4447 Type I SS 3593.3800000 3116.3929638 7125.7094171 Mean Square 864.717649 359.747231 F Value 2.40 Pr > F 0.0114

Root MSE 18.967004 Mean Square 1197.7933333 779.0982410 791.7454908

SURVIVE Mean 14.766667 F Value 3.33 2.17 2.20 Pr > F 0.0282 0.0890 0.0408

FIGURE 29.4 Two-way ANOVA, with interaction component, for results in Table 29.3 and Figure 29.3. [Printout from SAS PROC GLM computer program.]

Calculated with the new mean square error term in Figure 29.4, the F values produce 2P values below <.05 for the model, for the histology factor, and for the histology-TNM-stage interaction. The 2P value is about .09 for the TNM-stage main effect. The difficult challenge of interpreting three-way and more complex interactions are considered elsewhere2 in discussions of multivariable analysis.

29.6.2

Nested Analyses

The groups of a single factor in ANOVA can sometimes be divided into pertinent subgroups. For example, the three treatments A, B, and C might each have been given in two sets of doses, low and high, so that six subgroups could be analyzed, two for each treatment. The results can then be evaluated with a procedure called a hierarchical or nested analysis. The variations in the total sum of squares would arise for the six subgroups and the three main groups, and the analysis is planned accordingly.

29.6.3

Analysis of Covariance

An analysis of covariance (acronymically designated as ANCOVA) can be done for at least two reasons. The first is to adjust for the action of a second factor suspected of being as a confounder in affecting both the dependent variable and the other factor under analysis. The second reason is to allow appropriate analyses of a ranked independent variable that is expressed in either a dimensional or ordinal scale. This ranking is ignored when the ordinary ANOVA procedure relies on nominal categories for the independent variable. Thus, in the analyses shown in Figs. 29.3 and 29.4, the polytomous categories of TNM stage were managed as though they were nominal. To

2002 by Chapman & Hall/CRC

allow maintenance of the ranks, TNM stage could be declared a covariate, which would then be analyzed as though it had a dimensional scale. The results of the covariance analysis are shown in Figure 29.5. Note that TNM stage now has only 1 degree of freedom, thus giving the model a total of 4 D.F., an F value of 3.61 and a P value of 0.0111, despite a decline of R-square from .229 in Figure 29.3 to .208 in Figure 29.5. The histology variable, which had P = .052 in Figure 29.3 now has P = .428; and TNM stage, with P = .144 in Figure 29.3, has now become highly significant at P = .0012. These dramatic changes indicate what can happen when the rank sequence is either ignored or appropriately analyzed for polytomous variables.
Source Model Error Corrected Total DF 4 55 59 R-Square 0.207746 Source TNMSTAGE HISTOL DF 1 3 Sum of Squares 6087.9081999 23216.7051334 29304.6133333 C.V. 139.1350 Type I SS 4897.3897453 1190.5184546 Mean Square 1521.9770500 422.1219115 F Value 3.61 Pr > F 0.0111

Root MSE 20.545606 Mean Square 4897.3897453 396.8394849

SURVIVE Mean 14.766667 F Value 11.60 0.94 Pr > F 0.0012 0.4276

FIGURE 29.5 Printout of Analysis of Covariance for data in Figure 29.3, with TNM stage used as ranked variable.

In past years, the effect of confounding or ranked covariates was often formally adjusted in an analysis of covariance, using a complex set of computations and symbols. Today, however, the same adjustment is almost always done with a multiple regression procedure. The adjustment process in ANCOVA is actually a form of regression analysis in which the related effects of the covariate are determined by regression and then removed from the error variance. The group means of the main factor are also adjusted to correspond to a common value of the covariate. The subsequent analysis is presumably more powerful in detecting the effects of the main factor, because the confounding effects have presumably been removed. The process and results are usually much easier to understand, however, when done with multiple linear regression.2

29.6.4

Repeated-Measures Arrangements

Repeated measures is the name given to analyses in which the same entity has been observed repeatedly. The repetitions can occur with changes over time, perhaps after interventions such as treatment, or with examinations of the same (unchanged) entity by different observers or systems of measurement.

Temporal Changes The most common repeated-measures situation is an ordinary crossover study, where the same patients receive treatments A and B. The effects of treatment A vs. treatment B in each person can be subtracted and thereby reduced to a single group of increments, which can be analyzed with a paired t test, as discussed in Section 7.8.2.2. The same analysis of increments can be used for the before-and-after measurements of the effect in patients receiving a particular treatment, such as the results shown earlier for blood glucose in Table 7.4. Because the situations just described can easily be managed with paired t tests, the repeated-measures form of ANOVA is usually reserved for situations in which the same entity has been measured at three or more time points. The variables that become the main factors in the analysis are the times and the groups (such as treatment). Interaction terms can be added for the effects of groups times.

29.6.4.1

2002 by Chapman & Hall/CRC

Four major problems, for which consensus solutions do not yet exist, arise when the same entity is measured repeatedly over time:
1. Independence. The first problem is violation of the assumption that the measurements are independent. The paired t test manages this problem by reducing the pair of measurements to their increment, which becomes a simple new variable. This distinction may not always be suitably employed with more than two sets of repeated measurements.

2. Incremental components. A second problem is the choice of components for calculating incremental changes for each person. Suppose t0 is an individual baseline value, and the subsequent values are t 1, t2, and t 3. Do we always measure increments from the baseline value, i.e., t1 t0, t2 t0, and t3 t0, or should the increments be listed successively as t1 t0, t2 t1, t3 t2? 3. Summary index of response. If a treatment is imposed after the baseline value at t0 , what is the best single index for summarizing the post-therapeutic response? Should it be the mean of the post-treatment values, the increment between t 0 and the last measurement, or a regression line for the set of values? 4. Neglect of trend. This problem is discussed further in Section 29.8. As noted earlier, an ordinary analysis of variance does not distinguish between unranked nominal and ranked ordinal categories in the independent polytomous variable. If the variable represents serial points in time, their ranking may produce a trend, but it will be neglected unless special arrangements are used in the calculations.

29.6.4.2

Intraclass Correlations Studies of observer or instrument variability can also be regarded as a type of repeated measures, for which the results are commonly cited with an intraclass correlation coefficient (ICC). As noted in Section 20.7.3, the basic concept was developed as a way of assessing agreement for measurements of a dimensional variable, such as height or weight, between members of the same class, such as brothers in a family. To avoid the inadequacy of a correlation coefficient, the data were appraised with a repeated-measures analysis-of-variance. To avoid decisions about which member of a pair should be listed as the first or second measurements, all possible pairs were listed twice, with each member as the first measurement and then as the second. The total sums of squares could be partitioned into one sum for variability between the individuals being rated, i.e., the subjects (SSS), and another sum of squares due to residual error (SSE). The intraclass correlation was then calculated as
SSS SSE R I = ------------------------SSS + SSE

The approach was later adapted for psychometric definitions of reliability. The appropriate means for 2 2 for variance in the subjects and s e for the corresponding the sums of squares were symbolized as s c residual errors. Reliability was then defined as
2 2 2 RI = s c / ( sc + se )

Using the foregoing symbols, when each of a set of n persons is measured by each of a set of r raters, the variance of a single observation, s, can be partitioned as
2 2 s2 = sc + s r2 + s e

where s r2 is the mean of the appropriate sums of squares for the raters. These variances can be arranged into several formulas for calculating R I. The different arrangements depend on the models used for the sampling and the interpretation.6 In a worked example cited by Everitt,7 vital capacity was measured by four raters for each of 20 patients. The total sum of squares for the 80 observations, with d.f. = 79, was divided into three sets of sums of squares: (1) for the four
2002 by Chapman & Hall/CRC

observers with d.f. = 3; (2) for the 20 patients with d.f. = 19; and (3) for the residual error with d.f. = 3 19 = 57. The formula used by Everitt for calculating the intraclass correlation coefficient was
2 2 n ( sc se ) R I = ---------------------------------------------------------2 2 2 ns c + rs r + ( nr n r ) s e

A counterpart formula, using SSR to represent sums of squares for raters, is SSS SSE R I = --------------------------------------------------SSS + SSE + 2 ( SSR ) The intraclass correlation coefficient (ICC) can be used when laboratory measurements of instrument variability are expressed in dimensional data. Nevertheless, as discussed in Chapter 20, most laboratories prefer to use simpler pair-wise and other straightforward statistical approaches that are easier to under stand and interpret than the ICC. The simpler approaches may also have mathematical advantages that have been cited by Bland and Altman,8 who contend that the ICC, although appropriate for repetitions of the same measurement, is unsatisfactory when dealing with measurements by two different methods where there is no ordering of the repeated measures and hence no obvious choice of X or Y. Other disadvantages ascribed to the ICC are that it depends on the range of measurement and is not related to the actual scale of measurement or to the size of error which might be clinically allowable. Instead, Bland and Altman recommend their limits of agreement method, which was discussed throughout Section 20.7.1. The method relies on examining the increments in measurement for each subject. The mean difference then indicates bias, and the standard deviation is used to calculate a 95% descriptive zone for the limits of agreement. A plot of the differences against the mean value of each pair will indicate whether the discrepancies in measurement diverge as the measured values increase. For categorical data, concordance is usually expressed (see Chapter 20) with other indexes of variability, such as kappa, which yields the same results as the intraclass coefficient in pertinent situations.

29.7 Non-Parametric Methods of Analysis


The mathematical models of ANOVA require diverse assumptions about Gaussian distributions and homoscedastic (i.e., similar) variances. These assumptions can be avoided by converting the dimensional data to ranks and analyzing the values of the ranks. The Kruskal-Wallis procedure, which is the eponym for a one-way ANOVA using ranked data, corresponds to a WilcoxonMannWhitney U test for 3 or more groups. The Friedman procedure, which refers to a two-way analysis of ranked data, was proposed almost 60 years ago by Milton Friedman, who later become more famous in economics than in statistics.

29.8 Problems in Analysis of Trends


If a variable has ordinal grades, the customary ANOVA procedure will regard the ranked categories merely as nominal, and will not make provision for the possible or anticipated trend associated with different ranks. The problem occurs with an ordinal variable, such as TNM stage in Figure 29.1, because the effect of an increasing stage is ignored. The neglect of a ranked effect can be particularly important when the independent variable (or factor) is time, for which the effects might be expected to occur in a distinct temporal sequence. This problem in repeated-measures ANOVA evoked a denunciation by Sheiner,9 who contended that the customary ANOVA methods were wholly inappropriate for many studies of the time effects of pharmacologic agents.

2002 by Chapman & Hall/CRC

The appropriate form of analysis can be carried out, somewhat in the manner of the chi-square test for linear trend in an array of proportions (see Chapter 27), by assigning arbitrary coding values (such as 1, 2, 3, 4) to the ordinal categories. The process is usually done more easily and simply, however, as a linear regression analysis.

29.9 Use of ANOVA in Published Literature


To find examples of ANOVA in published medical literature, the automated Colleague Medical Database was searched for papers, in English, of human-subject research that appeared in medical journals during 199195, and in which analysis of variance was mentioned in the abstract-summary. From the list of possibilities, 15 were selected to cover a wide array of journals and topics. The discussion that follows is a summary of results in those 15 articles. A one-way analysis of variance was used to check the rate of disappearance of ethanol from venous blood in 12 subjects who drank the same dose of alcohol in orange juice on four occasions.10 The authors concluded that the variation between subjects exceeded the variations within subjects. Another classical one-way ANOVA was done to examine values of intestinal calcium absorption and serum parathyroid hormone levels in three groups of people: normal controls and asthmatic patients receiving either oral or inhaled steroid therapy.11 A one-way ANOVA compared diverse aspects of functional status in two groups of patients receiving either fluorouracil or saline infusions for head and neck cancer. 12 In a complex but essentially one-way ANOVA, several dependent variables (intervention points, days of monitoring, final cardiovascular function) were related to subgroups defined by APACHE II severity scores in a surgical intensive care unit.13 (The results were also examined in a regression analysis.) In another one-way analysis of variance, preference ratings for six different modes of teaching and learning were evaluated14 among three groups, comprising first-year, second-year, and fourth-year medical students in the United Arab Emirates. The results were also examined for the preferences of male vs. female students. In a two-way ANOVA, neurologic dysfunction at age four years was related15 to two main factors: birth weight and location of birth in newborn intensive care units of either Copenhagen or Dublin. Multifactor ANOVAs were applied,16 in 20 patients with conjunctival malignant melanoma, to the relationship between 5-year survival and the counts of cells positive for proliferating cell nuclear antigen, predominant cell type, maximum tumor depth, and site of tumor. The result, showing that patients with low counts had better prognoses, was then confirmed with a Cox proportional hazards regression analysis. (The latter approach would probably have been best used directly.) Repeated measures ANOVA was used in the following studies: to check the effect of oat bran consumption on serum cholesterol levels at four time points;17 to compare various effects (including blood pressure levels and markers of alcohol consumption) in hypertensive men randomized to either a control group or to receive special advice about methods of reducing alcohol consumption;18 to assess the time trend of blood pressure during a 24-hour monitoring period in patients receiving placebo or an active antihypertensive agent;19 and to monitor changes at three time points over 6 months in four indexes (body weight, serum osmolality, serum sodium, and blood urea nitrogen/creatinine ratios) for residents of a nursing home. 19 The intraclass correlation coefficient was used in three other studies concerned with reliability (or reproducibility) of the measurements performed in neuropathic tests, 21 a brief psychiatric rating scale, 22 and a method of grading photoageing in skin casts.23

References
1. Feinstein, 1990d; 2. Feinstein, 1996; 3. Oxford English Dictionary, 1971; 4. Lentner, 1982; 5. Tukey, 1968; 6. Shrout, 1979; 7. Everitt, 1989; 8. Bland, 1990; 9. Sheiner, 1992; 10. Jones, 1994; 11. Luengo, 1991; 12. Browman, 1993; 13. Civetta, 1992; 14. Paul, 1994; 15. Ellison, 1992; 16. Seregard, 1993; 17. Saudia, 1992; 18. Maheswaran, 1992; 19. Tomei, 1992; 20. Weinberg, 1994; 21. Dyck, 1991; 22. Hafkenscheid, 1993; 23. Fritschi, 1995.
2002 by Chapman & Hall/CRC

Anda mungkin juga menyukai