Anda di halaman 1dari 10

LECTURE 10

INFERENCE ON MEAN VECTORS

Introduction

In this section we will generalize the concept of t-tests to the multivariate situation. In
the two-sample problem, we will assume equality of covariance matrices. Later we will
develop tests for equality of covariance matrices and for other patterns on the covariance
matrix. The concept of t-tests generalizes in the univariate case to the analysis of
variance and we will make that generalization in the multivariate case. We will see that
SAS does not have a PROC specifically devoted to the multivariate t-test so we will
develop an IML program. Later we will see how to use the multivariate analysis of
variance program to perform the test.

Hotelling T2 Statistic

(This concept is discussed in DJ Section 10.2 page 408.)

The problem is as follows: Given a random sample of size n from a p-variate normal
distribution, N(., D), that is, the data matrix X, we want to test the hypothesis

H: . .0

where .0 is a specified p-vector.

p1

For motivation, recall the special case p 1. In that case we computed


_

(s/n)
x.0
t

and rejected the hypothesis if ltl t( !2 , (n 1)). Equivalently, we could compute F t2


and reject the hypothesis if F F(!, 1, (n 1). Note that we can write
_ _ _
(x.0 )2
t2 (s2 /n)
n(x .0 ) s"2 (x .0 )
p1

In the multivariate case, the test statistic is given by


_ ^ 1 (_x . )
T2 n(x .0 )T D 0
_
noting that x and .0 are vectors of length p. The hypothesis is rejected if
(n1)p
T2 (np)
F(!, p, (n p))
or equivalently
(np) 2
(n1)p
T F(!, p, (n p))

To motivate the statistic note that this problem is similar to the classification problem.
That is, did this data come from a population with mean, .0 of from a population with
some other mean. _
The decision is based on the distance from x to . _ 0 . Equivalently, it is based on the ratio
of the likelihood functions for . .0 and . x.

Example

Recall the SWEAT data from a previous exercise and suppose the question is raised as to
whether these 20 females were selected from a population of females whose mean vector
is know to be .T0 (4 50 10) on the three variables, x1 sweat rate, x2 sodium
content and x3 potassium content. From the 20 observations, we compute

_
4.64 2.9 10.0 -1.8

9.97 -1.8 3.6


^
x 45.40 and D 10.0 200 -5.6
-5.6
(np) 2
It follows that T2 9.74 and (n1)p
T 2.9. With ! 0.10, we see that

F(0.10, 3, 17) 2.44 and the hypothesis is rejected.

Invariance

Since the variances in this example are quite different, if is natural to ask if we should
have standardized the data before performing the test. The answer is that it does not
make any difference. In fact, if C is any non-singular matrix of size p and d is any p-
vector, we can compute y Cx d and the T2 statistic based on y will be identical to the
one based on x.

Development of the T2 Statistic

For those interested, here are two developments of the T2 statistic.

Likelihood Ratio Test

Given n observations on a p-variate normal distribution, the likelihood function is


given by

f(X, ., D) (21) 2 lDl # exp "# tr(D1 A(.)


p n
where A(.) (X J.T )T (X J.T ). Letting A(x- ) XT Sn X we can write
_ _
A(.0 ) A(x- ) n(x .0 )(x .0 )T

The likelihood ratio test leads to the ratio


lA(x- )l
) lA(.)l

and we reject the hypothesis, H: . .0 if ) is 'small'. To relate to the T2 statistic,

n(x .0 )
consider the determinant _
A(x- )
n(_x . ) 1
0

_ _
lA(x- ) n(x .0 )(x .0 )T l lA(.0 )l
_ _
lA(x- )ll1 n(x .0 )T A(x- )1 (x .0 )l

It follows that
lA(x- )l "
) lA(.)l
2
1 (nT1)

and we reject the hypothesis if T2 is large.

Union-Intersection Test

Let a be an arbitrary p-vector and consider y aT x ~ N(aT ., aT Da). Then consider the
hypothesis, H(a):E[y] .y0 based on the data Y Xa. The t-statistic for this univariate
test is _
(y.y0 )
sy /n
t(a) 2

^ a, we see that
Noting that .y aT .0 , y- aT x- and sy2 aT D
_ _
naT (x.0 )(x.0 )T a
t2 (a) ^a
aT D

and we would accept the hypothesis is t2 (a)  F(!, 1, (n 1)). Noting that . .0 is
equivalent to aT . aT .0 for any vector a, Thus acceptance of the hypothesis H:. .0
is equivalent to accepting the hypothesis H(a) for any vector a. this implies that

max t2 (a)  F(!, 1, (n 1))


a

Thus we seek a to maximize t2 (a) of equivalently we consider the problem


_ _
max naT (x .0 )(x .0 )T a

^a
subject ot aT D

These leads to the system of equations

nD
^ 1 (_x . )(_x . )T -Ia 0
0 0

since the matrix has rank one, there is only one eigenvalue and eigenvector given by
_ ^ 1 (_x . )
- n(x .0 )T D 0

^ 1 (_x . )
aD 0

Thus the maximum is given by the T2 statistic and we are lead to the same test.

Confidence Regions and Intervals

It is usually more informative to give a confidence interval on a parameter than simple an


accept reject decision. In the univariate case, we recall the familiar confidence interval
for ..

.: x- t(!/2, (n 1)s2 /n

The analog of that in the multivariate case is the confidence region given by the p-
dimensional ellipsoid centered at x- given by

^ 1 (. x- )
n(. x- )T D p(n1)
F(!,p, (n p)
(np)

Thus, we accept as reasonable, any vector . that lies in this ellipsoid. Since this is hard
to visualize in more than two dimensions, a common practice is to write confidence
intervals on the individual components or linear functions of them. Since we are writing
several intervals, the confidence coefficient based on a single interval is no longer
applicable and alternatives have been suggested. For example, if we are testing k
!
intervals, a simple suggestion it to use t( #k , (n 1) in the above univariate expression
2 ^
with s replace by 5 ii .. This is know as the Bonferroni method. In addition to the p
components of . we may be interested in other linear functions, say, aT . and k can be
quite large making these intervals very wide. An alternative, suggested by the Union-
Intersection principle is to use the intervals

aT .: aT x- "n (aT D
^ a) p(n1) F(!,p, (n p)
(np)
These intervals, called simultaneous confidence interval, although wider than the simple
t-statistic intervals, more correctly reflect the confidence coefficient. Since we often
write many such intervals, they are generally better than the Bonferroni intervals that are
given by

aT .: aT x- t( 2k
!
, (n 1)) "n (aT D
^ a)

GENERAL LINEAR HYPOTHESES

As a generalization of the hypothesis H0 :. .0 we consider the hypothesis, H0 :


M. m where M is a matrix of size q p of rank q and m is a q-vector, usually zero.
We will encounter several M matrices

Testing for Equality of Elements of .

The initial hypothesis H: . .0 is a special case of this with M I and h .0 .


Another hypothesis is that all of the elements of . are equal. With p 4, consider the
matrix

1 1 0 0 0
0 1 0
M 0 1 1 0 m 0
0 1

Thus,

.1 .2
. .
M. .2 .3
3 4

and we see that this matrix is testing the hypothesis that all elements of the mean vector
are the same. The T2 statistic generalizes as follows:

^ MT )1 (Mx- m)
T2 n(Mx- m)T (MD

and we reject the hypothesis if


q(n1)
T2 (nq)
F(!, q, (n q))

The concept of simultaneous confidence intervals extends directly. Thus we can write
intervals on linear functions of the form aT M.

aT M.: aT Mx- "n aT MD


^ MT a q(n1) F(!,q, (n q)
(nq)
For example with M as given above and if aT (1 1 0) we have aT M (1 2 1) and
hence the function aT M. .1 2.2 .3 .

The Analysis of Repeated Measurements

Often, the p responses on a subject are taken over time. That is the p columns of the X
matrix represent observations taken at various points in time. In this case, we may ask if
there is a difference in response at different points in time. The hypothesis of equal
means discussed above would be appropriate. If this hypothesis is rejected, we might ask
if there is a linear trend. That is, is the rate of change from one time to the next constant.
Assuming that the time periods are equally space, this says that the difference between
consecutive means is constant. The hypothesis of a linear trend is described (for p 4)
as
H0 : .2 .1 .3 .2 .4 .3

This is equivalent to the hypothesis

H0 : .3 2.2 .1 0
.4 2.3 .2 0

and the appropriate matrix is given by

M m
1 2 1 0 0
0 1 2 1 0

A plot of the sample means over time helps to visualize the situation. If the hypothesis of
a linear trend is accepted, we would see that these means lie roughly on a straight line
and we might be interested in the equation of that line. This suggested fitting a linear
regression model to the sample means. Thus, we would consider the linear model

x- i b0 b1 ti ei

where ti denotes the ith time period. Ordinary linear regression is not appropriate in this
case, since the x- i are not independent. Recall that, if x- denotes the vector of means, then
Var(x) - D/n. We thus, consider a weighted least squares estimator. To describe this,
let B denote the matrix whose first column is all ones and whose second column is
(t1 ,t2 tp )T . Then the estimator is given by

b (B D B) B D x
b0 1 1
T^ 1 T ^ -
1

Clearly, one might consider a more complex relation, for example, a quadratic in time.

Refer to RRH- DJ-EX-10.5 to illustrate this analysis.

Note comment on page 421


THE TWO-SAMPLE PROBLEM

Suppose we have data from two normal populations with the same covariance matrix D
but possibly different mean vectors, .1 and .2 . We wish to test the hypothesis

H: .1 .2 .

Let the data matrices be X1 of size n1 p and X2 of size n2 p. Define the sample mean
_ _ ^ and D ^ . Since we are assuming the
vectors, x1 and x2 and sample covariance matrices D 1 2
covariance matrices are equal, we compute the 'pooled' covariance matrix
^ (n 1) D
^
^
D
(n1 1) D 1 2 1
(n1 n2 2)

For testing the hypothesis, H .1 .2 $ , the test statistic is given by

T2 n1 n2
(x-
(n1 n2 ) 1
x- 2 )T D1 (x- 1 x- 2 )

The hypothesis is rejected if


p(n1 n2 2)
T2 (n1 n2 p1)
F(!, p, (n1 n2 p 1)

or equivalently, if
(n1 n2 p1) 2
p(n1 n2 2)
T F(!, p, (n1 n2 p 1).

In general, we may consider a hypothesis of the form

H: M(.1 .2 ) $ .

where M is a matrix of size q p of rank q. The test statistic is

(n1 n2 )
x- 2 ) MDMT M(x- 1 x- 2 )
T 1
T2 n1 n2
M(x- 1

and the hypothesis is rejected if


q(n1 n2 2)
T2 (n1 n2 q1)
F(!, q, (n1 n2 q 1)

Simultaneous confidence intervals on functions of the form aT M(.1 .2 )


are given by

aT M(.1 .2 ):
aT M(x- 1 x- 2 ) (nn11nn22 ) aT MD
^ MT a q(n1 n2 2) F(!,q, (n q)
(n1 n2 q-1)

For example, with aT (1 00), we are interested in the first row of M. With
aT (1 1 0 0) we are interested in the difference of the first two rows etc.

Two Group Repeated Measures: Profile Analysis

Recall the concept of repeated measurements discussed earlier. That is, we now assume
the we have data on two populations taken over time and are interested in examining the
relation beyond simply asking if the mean vectors are the same. Several hypotheses are
of interest. Again these are motivated by a plot of the sample means for the two groups
as a function of time. This plot is sometimes called a "profile". Examining this plot
suggests three questions that we might ask.

(REFER TO PROFILE PLOT)

1. Are the profiles for the two groups similar in the sense that the line segments of
adjacent tests are parallel?
2. If the profiles are parallel, are they at the same level?
3. If the profiles are parallel, are the means of the tests different?

Another way to look at the Profile Plot is to display the means for the two groups in a
two-way table as follows:

Time
1 2 3 4
Group 1 .11 .12 .13 .14 .1
2 .21 .22 .23 .24 .2
.1 .2 .3 .4

Question 1. Parallelism Hypothesis: Group by Time Interaction

To answer the first question, we note that the parallelism condition is that that the
difference the means in consecutive time periods is the same in each group. Thus, we
want to test the hypothesis

H0 : .11 .12 .21 .22


.12 .13 .22 .23
.13 .14 .23 .24
etc.

The appropriate matrix in this case ( for p 4) is


1 1 0 0

0 1
M 0 1 1 0
0 1

and again we test the hypothesis H0 : M(.1 .2 ) 0. In analysis of variance


terminology, this is known as the "group by time interaction" hypothesis.

Question 2. Equal Average Over Time: Group Main Effect

Letting .ij denote the mean response in group i, i 1,2 at time j, j 1,2,,p. Suppose
we want to check that the average effect over time is the same in the two groups. The
hypothesis of interest is

H0 : ! .1j ! .2j
p p

j 1 j 1
In our general notation, this corresponds to letting M (1,1,,1) and testing the
hypothesis

H0 : M(.1 .2 ) 0

In analysis of variance terminology, this is known as the "marginal means, main effect
test". Note that we can test this hypothesis even if the parallelism hypothesis is rejected,
but the interpretation is not as strong.

Question 3. Are the means of the tests different? Time Main Effect

In this case, the hypothesis is stated as

H0 : !.ij !.ij* for j j*


2 2

i 1 i 1

or, in matrix notation, the hypothesis is written as

H0 : M(.1 .2 ) 0

with
1 1 0 0

0 1
M 0 1 1 0
0 1

The test statistic is given by

(n1 n2 )
x- 2 ) MDMT M(x- 1 x- 2 )
T 1
T2 n1 n2
M(x- 1
and the hypothesis is rejected if
q(n1 n2 2)
T2 (n1 n2 q1)
F(!, q, (n1 n2 q 1)

Refer to RRH-DJ-EX-10.6 to illustrate this analysis.

In the next section we will extend this concept to more than two groups.

Anda mungkin juga menyukai