Introduction
In this section we will generalize the concept of t-tests to the multivariate situation. In
the two-sample problem, we will assume equality of covariance matrices. Later we will
develop tests for equality of covariance matrices and for other patterns on the covariance
matrix. The concept of t-tests generalizes in the univariate case to the analysis of
variance and we will make that generalization in the multivariate case. We will see that
SAS does not have a PROC specifically devoted to the multivariate t-test so we will
develop an IML program. Later we will see how to use the multivariate analysis of
variance program to perform the test.
Hotelling T2 Statistic
The problem is as follows: Given a random sample of size n from a p-variate normal
distribution, N(., D), that is, the data matrix X, we want to test the hypothesis
H: . .0
p1
(s/n)
x.0
t
To motivate the statistic note that this problem is similar to the classification problem.
That is, did this data come from a population with mean, .0 of from a population with
some other mean. _
The decision is based on the distance from x to . _ 0 . Equivalently, it is based on the ratio
of the likelihood functions for . .0 and . x.
Example
Recall the SWEAT data from a previous exercise and suppose the question is raised as to
whether these 20 females were selected from a population of females whose mean vector
is know to be .T0 (4 50 10) on the three variables, x1 sweat rate, x2 sodium
content and x3 potassium content. From the 20 observations, we compute
_
4.64 2.9 10.0 -1.8
Invariance
Since the variances in this example are quite different, if is natural to ask if we should
have standardized the data before performing the test. The answer is that it does not
make any difference. In fact, if C is any non-singular matrix of size p and d is any p-
vector, we can compute y Cx d and the T2 statistic based on y will be identical to the
one based on x.
n(x .0 )
consider the determinant _
A(x- )
n(_x . ) 1
0
_ _
lA(x- ) n(x .0 )(x .0 )T l lA(.0 )l
_ _
lA(x- )ll1 n(x .0 )T A(x- )1 (x .0 )l
It follows that
lA(x- )l "
) lA(.)l
2
1 (nT1)
Union-Intersection Test
Let a be an arbitrary p-vector and consider y aT x ~ N(aT ., aT Da). Then consider the
hypothesis, H(a):E[y] .y0 based on the data Y Xa. The t-statistic for this univariate
test is _
(y.y0 )
sy /n
t(a) 2
^ a, we see that
Noting that .y aT .0 , y- aT x- and sy2 aT D
_ _
naT (x.0 )(x.0 )T a
t2 (a) ^a
aT D
and we would accept the hypothesis is t2 (a) F(!, 1, (n 1)). Noting that . .0 is
equivalent to aT . aT .0 for any vector a, Thus acceptance of the hypothesis H:. .0
is equivalent to accepting the hypothesis H(a) for any vector a. this implies that
^a
subject ot aT D
nD
^ 1 (_x . )(_x . )T -Ia 0
0 0
since the matrix has rank one, there is only one eigenvalue and eigenvector given by
_ ^ 1 (_x . )
- n(x .0 )T D 0
^ 1 (_x . )
aD 0
Thus the maximum is given by the T2 statistic and we are lead to the same test.
.: x- t(!/2, (n 1)s2 /n
The analog of that in the multivariate case is the confidence region given by the p-
dimensional ellipsoid centered at x- given by
^ 1 (. x- )
n(. x- )T D p(n1)
F(!,p, (n p)
(np)
Thus, we accept as reasonable, any vector . that lies in this ellipsoid. Since this is hard
to visualize in more than two dimensions, a common practice is to write confidence
intervals on the individual components or linear functions of them. Since we are writing
several intervals, the confidence coefficient based on a single interval is no longer
applicable and alternatives have been suggested. For example, if we are testing k
!
intervals, a simple suggestion it to use t( #k , (n 1) in the above univariate expression
2 ^
with s replace by 5 ii .. This is know as the Bonferroni method. In addition to the p
components of . we may be interested in other linear functions, say, aT . and k can be
quite large making these intervals very wide. An alternative, suggested by the Union-
Intersection principle is to use the intervals
aT .: aT x- "n (aT D
^ a) p(n1) F(!,p, (n p)
(np)
These intervals, called simultaneous confidence interval, although wider than the simple
t-statistic intervals, more correctly reflect the confidence coefficient. Since we often
write many such intervals, they are generally better than the Bonferroni intervals that are
given by
aT .: aT x- t( 2k
!
, (n 1)) "n (aT D
^ a)
1 1 0 0 0
0 1 0
M 0 1 1 0 m 0
0 1
Thus,
.1 .2
. .
M. .2 .3
3 4
and we see that this matrix is testing the hypothesis that all elements of the mean vector
are the same. The T2 statistic generalizes as follows:
^ MT )1 (Mx- m)
T2 n(Mx- m)T (MD
The concept of simultaneous confidence intervals extends directly. Thus we can write
intervals on linear functions of the form aT M.
Often, the p responses on a subject are taken over time. That is the p columns of the X
matrix represent observations taken at various points in time. In this case, we may ask if
there is a difference in response at different points in time. The hypothesis of equal
means discussed above would be appropriate. If this hypothesis is rejected, we might ask
if there is a linear trend. That is, is the rate of change from one time to the next constant.
Assuming that the time periods are equally space, this says that the difference between
consecutive means is constant. The hypothesis of a linear trend is described (for p 4)
as
H0 : .2 .1 .3 .2 .4 .3
H0 : .3 2.2 .1 0
.4 2.3 .2 0
M m
1 2 1 0 0
0 1 2 1 0
A plot of the sample means over time helps to visualize the situation. If the hypothesis of
a linear trend is accepted, we would see that these means lie roughly on a straight line
and we might be interested in the equation of that line. This suggested fitting a linear
regression model to the sample means. Thus, we would consider the linear model
x- i b0 b1 ti ei
where ti denotes the ith time period. Ordinary linear regression is not appropriate in this
case, since the x- i are not independent. Recall that, if x- denotes the vector of means, then
Var(x) - D/n. We thus, consider a weighted least squares estimator. To describe this,
let B denote the matrix whose first column is all ones and whose second column is
(t1 ,t2 tp )T . Then the estimator is given by
b (B D B) B D x
b0 1 1
T^ 1 T ^ -
1
Clearly, one might consider a more complex relation, for example, a quadratic in time.
Suppose we have data from two normal populations with the same covariance matrix D
but possibly different mean vectors, .1 and .2 . We wish to test the hypothesis
H: .1 .2 .
Let the data matrices be X1 of size n1 p and X2 of size n2 p. Define the sample mean
_ _ ^ and D ^ . Since we are assuming the
vectors, x1 and x2 and sample covariance matrices D 1 2
covariance matrices are equal, we compute the 'pooled' covariance matrix
^ (n 1) D
^
^
D
(n1 1) D 1 2 1
(n1 n2 2)
T2 n1 n2
(x-
(n1 n2 ) 1
x- 2 )T D1 (x- 1 x- 2 )
or equivalently, if
(n1 n2 p1) 2
p(n1 n2 2)
T F(!, p, (n1 n2 p 1).
H: M(.1 .2 ) $ .
(n1 n2 )
x- 2 ) MDMT M(x- 1 x- 2 )
T 1
T2 n1 n2
M(x- 1
aT M(.1 .2 ):
aT M(x- 1 x- 2 ) (nn11nn22 ) aT MD
^ MT a q(n1 n2 2) F(!,q, (n q)
(n1 n2 q-1)
For example, with aT (1 00), we are interested in the first row of M. With
aT (1 1 0 0) we are interested in the difference of the first two rows etc.
Recall the concept of repeated measurements discussed earlier. That is, we now assume
the we have data on two populations taken over time and are interested in examining the
relation beyond simply asking if the mean vectors are the same. Several hypotheses are
of interest. Again these are motivated by a plot of the sample means for the two groups
as a function of time. This plot is sometimes called a "profile". Examining this plot
suggests three questions that we might ask.
1. Are the profiles for the two groups similar in the sense that the line segments of
adjacent tests are parallel?
2. If the profiles are parallel, are they at the same level?
3. If the profiles are parallel, are the means of the tests different?
Another way to look at the Profile Plot is to display the means for the two groups in a
two-way table as follows:
Time
1 2 3 4
Group 1 .11 .12 .13 .14 .1
2 .21 .22 .23 .24 .2
.1 .2 .3 .4
To answer the first question, we note that the parallelism condition is that that the
difference the means in consecutive time periods is the same in each group. Thus, we
want to test the hypothesis
0 1
M 0 1 1 0
0 1
Letting .ij denote the mean response in group i, i 1,2 at time j, j 1,2,,p. Suppose
we want to check that the average effect over time is the same in the two groups. The
hypothesis of interest is
H0 : ! .1j ! .2j
p p
j 1 j 1
In our general notation, this corresponds to letting M (1,1,,1) and testing the
hypothesis
H0 : M(.1 .2 ) 0
In analysis of variance terminology, this is known as the "marginal means, main effect
test". Note that we can test this hypothesis even if the parallelism hypothesis is rejected,
but the interpretation is not as strong.
Question 3. Are the means of the tests different? Time Main Effect
i 1 i 1
H0 : M(.1 .2 ) 0
with
1 1 0 0
0 1
M 0 1 1 0
0 1
(n1 n2 )
x- 2 ) MDMT M(x- 1 x- 2 )
T 1
T2 n1 n2
M(x- 1
and the hypothesis is rejected if
q(n1 n2 2)
T2 (n1 n2 q1)
F(!, q, (n1 n2 q 1)
In the next section we will extend this concept to more than two groups.