2
JAYANTA KUMAR GHOSH
Indian Statistical Institute, Calcutta
3
PRANAB KUMAR SEN
University of North Carolina, Chapel Hill
Summary.
test statistic does not hold for testing homogeneity (i.e., no mixture)
against mixture alternatives.
developed.
62E20. 62F05.
1) This is one of the three examples presented by the first author at the
Neyman-Kiefer Conference.
3) Work partially supported by the National Heart. Lung and Blood Institute,
Contract NIH-NHLBI-71-2243-L from the National Institutes of Health.
1.
Introduction.
(l-TI)g(x,e(l
+ TIg(x,e(2,
0 < TI < 1.
TI
0,
TI
and
e(l)
+ 8(2),
then
the equality
(l-TI)g(x,e(l
implies
8(2)
TI
= TI',
= e(3).
e(l)
+ TIg(x,e(2
= e(3),
= (1-TI')g(x,e(3
8(2)
= 8(4)
or
TI
+ TI'g(x,e(4
=1
- TI',
g(x,8) = g(x,8')
(1.1)
8(1)
= 8(4) ,
implies
e = 8' .
(Both here and in (1.1) the relations between two densities hold almost
every where with respect to the dominating
~.)
a-finite measure
= g(x,e) ,
e(l)
f,
HI
(1.2)
TI
+ 0,
TI
+ 6(2).
ensur~s
that
H and
O
HI
have
+1
The real
f(x,TI,8(1) .8(2
H
1
TI,
(1-TI,8(2) ,8(1,
the same density; it will be seen in Section 5 that this kind of nonuniqueness is not hard to take care of.
true density
is
g(x,8(0,8(0)
= 8(0)
or
8(2)
= 8(0)
However, if
TI
8 (1)
and
or
TI
= 1 and
this fact is to observe that we can pass to the one dimensional space of
H by specifying only one co-ordinate at a time -- and not two -- in the
O
three dimensional space of
H
1
= (1-TI)8(0) + TI8(1),
A
1
A
2
TI
H '
O
= ~),
and
A
3
(with
= {Min(TI,1-TI)}{2A 2-8(0)-8(1)}.
tures.
and
convenient to replace
TI
by
in the mixture
f(x,8 ,8 ,8 ) =
0 1 2
(1)
(8 ,8 ,8 ),
0 1 2
-2-
(1.3)
against the strongly identifiable mixture alternatives.
is true the parameters are still not identifiable.
Note that if
H
O
Here is an example of
Here
N(e,l)
e.
that the parameter space is the three dimensional Euclidean space and all
points non-identifiable from the true value
eO
lie on a curve
f.
The
best that one can hope for is that the maximum of the likelihood will
eventually be attained in a neighbourhood of this curve.
Actually, Redner
(1981) has observed that essentially Wald's proof under Wald's conditions
(sans identifiability) guarantees this; Redner calls it convergence of the
mle in the topology of the quotient space obtained by collapsing
a single point.
into
e,
namely
-3-
eo
gl(x,e~),
the first
and
e ,
l
ez
eo
=0
Of course there is no
and
to which
eZ
can converge.
eZ
cannot
H
is
O
eO
and
However,
where
z)}
and
T(e)
W=
e~
under
H
O
treatment does not follow from Chernoff (1954) or Feder (1968), because
they were able to exploit the existence of a consistent solution of the
likelihood equation in the identifiable case.
likelihood ratio test via approximation by Bayes tests also breaks down.
G on
82
and
=b
general exponentials.
do not prove it here) that the likelihood ratio test is not asymptotically
locally minimax for these examples -- these are thus new instances of the
failure of the principle of maximising the likelihood.
For the case of (not strongly) identified mixtures. a result analogous
to Theorem 2.1 is obtained for the likelihood ratio test of the hypothesis
in (1.2).
where
E > 0
is a fixed quantity.
118(1)_8(2)
II
>
-5-
G{(8-A )/a 2 }
mixing distributions
where
of which one needs only that the third moment about mean is zero.
set-up of two point mixtures, this would correspond to assuming
so that
In our
8
= IT
k2,
18(1)_8(0)\ = O.
N(~,o
).
The
would hold if
and
a
in a compact set.
(~,o)
is unknown but
# 0
TI
Note
have confined ourselves to the case of mixtures our main conclusions hold
for other cases of non-identifiable parameters.
2.
and
Let
(8 ,8 ,8 )
0
1 2
=
n
i.i.d. observations.
Suppose
Ho
of
probabilities will be computed in this and the next section under this
assumption but this will not be displayed in the notation.
We now sketch an argument leading to Theorem 2.1, introducing notations
-6-
as we go along.
as the necessary assumptions (AI through AS) are collected in the next
section.
The assumptions are similar to those in the classical case but have to be
at various places.
T (-)
o2
Assumption A (vide
introduced
or its closure.
The
The
as an open rectangle in
As
RP .
Among other things the assumptions of the next section guarantee that
all quantities introduced below are well-defined.
We now begin by rescaling the parameters through
8
+ 11 0 / In
where
eO
0
eO
ln
I + l1 l /
e2
eO
11 2
Let
L (e)
n
(11 0
,n 1 ,11 2)
be denoted as
Note that
written simply as
V (11)
n
Vn (O,0,112)
V (0).
n
Let
-7-
11 2
11 =
nO
(n )
eo
be the
lXp
and let
components of
8 .
1
indicates dependence on
lowed below.
nZ
or lack of it.
nZ in Uno Z
Let
and
(p+l)X(p+l)
-rOo (n )
z
I(nZ)
r01(nz)T
-8-
U
nl
I .. 's
By Assumption AI.
log f(X ,8)
1
1J
Expanding
v (n)
n
n2
nl
is
o (1)
p
on bounded sets of
and
Sup L (8)
0<8 <1 n
-
er-
(0)
n2 ,
Sup A (n) + 0 (1)
p
n
ner->0
(2.2)
n1 ERP
GlEel
(The proof is similar to the classical case but one has to ensure uniformity
in
n2 ).
By the well-known Kuhn-Tucker-Lagrange theorem (viz., McCormick (1967)]
the supremum of
A (n)
is
(2.3)
if
(2.4)
I-1U T
~ nl 11 nl
~f
...
-9-
(2.5)
Similarly
L (H )
def
Sup L (8)
n
(Z.6)
eo =0
8 Ee
l l
Hence,
(n z) = Z{L (n z) def
n
L (H )}
n
= 0 (1)
(2.7)
(Z.8)
An = Sup L n (n z)
n2
W
where
nZ
T (e)
n
process
(Z.9)
L (H )
= Sup
Assume
cess
ez =
[b,c].
taking values in
T(e)
on
C[b,c]
(under
C[b,c]
T (e)
1
T (n
and
ZZ )
The covariance
scalar:
where
8~.
J(nZl,n ZZ )
Note that
is the covariance of
Var(Tl(n Z
1 V n Z.
-10-
lO
Since
(n Zl )
A11
and
lO
oY (e)
n
(n ZZ )
where
under
is
Theorem 2.1.
converges in distribution to
assumes only
distinct values.
k-l
oT(e).
A
dim 0
+ dim 02 + 1
Tn (8 2 )
and
An (8 )
2
are given
by a finite set
0
2
i = 1, .. ,m
1
m
8 )" .)8
2
2
H .
0'
the dispersion
is true.
o
i
T (8 )'s;
n 2
form in
3.
X2-distribution.
T 's
limiting
nol '
801
0
nolo
(8 ,8 ),
0
1
Let
01 =
d
--
(O,e 0l )
ae 0 ,
= - -,
D.
ae lj
j =
(i)
is an open set of
interval
[b, c]
of
RP ,
1
R
-11-
and
unless other-
0
(e ol
,8 )
2
l, . ,p,
a closed bounded
ii)
f(x,8)
is continuous in
E(D
01
= 0,
log f)
(iv)
E{
Sup
II 8 ol-8~111 <0
8 E8
2 2
as
0,
j , j'
= 0, 1, . , p
uniformly on bounded
A2.
nol-plane.
nol-sets
E(H(Xl
W(X1,o)
<
nl
i. e ., bounded
in (2.1) is
o (1)
p
Moreover,
while cal-
L (8).
neighbourhoods in the
81
is continuous on
82 ,
N of
01
such that
IW(X ,8 2 )1 ~ H(X ) V 8 2
l
l
and
00.
L (8) - V (0),
Sup
8
we get
-12-
01
E[O,l]XN
Sup
(8) -
0l
(1)
(3.1)
E:[0,1]XN
8 .
2
uniformly in
A3.
L (e) =
For each
and radius
[0,1] x 01
and
U = U(e 1,0)
such that if
then
By A3 and continuity of
Let
U = {e
ol
I 18l-8~1 I
; 0 < eo < 0,
U n [0,1] x N by sets
U(8
-1
0l
,01)
<
oJ.
L:t/J(X ,U ,8 ),
i j 2
= l, ,m,
82
02
Ul' ,U
m
to con-
clude that
=
uniformly in
e
2
(1)
At the third and final stage, note that, by Taylor's theorem and AI,
L (8)
n
-13-
where
IJij(T)
(p+1)x (p+l)
1(8 )
2
A4.
is continuous in
greater than
AS.
EIDo log
> 0
V 82 ;
o x 8Z
uniformly in
We now use
82
and
a,y > O.
some
AS ensures tightness of
no
(.).
is also
U
n1
K and
(1).
n > n ,
o
(Sup U (n ) + Iu 11 < K ,
no Z
n
" n
P the
E:
> 1-
U0 x H2
(p+1)X(p+l)
[Jij(T)] > ' ,
n > no'
< Vn(O)
where
>
n ,
A (n)
n
Rn1
if
if
T)
the supremum of
(over
Uo x HZ
K and
L (8),
n
'.
i.e.,
of (2.1) is
0p(l)
I Inoll I
and
depends only on
> M and
v (n)
n
(over
IIn ol ll
~M.
o)
> 1-
E:
and that of
The proof of the similar result (2.6) follows along similar lines
from Al through A4 which are of course much stronger than what we need
for (2.6).
-14-
for
E(~(X8
and
<
8 > 1.
for some
00
T (e)
8 .
1
It is weakly continuous in
T()
is continuous in
8
1
o
1
and
el .
tion of
as
T(e)
under
'"
01
be such that
8 ,
1
then
lim P
is continuous in
is consistent for
8
1
under
H
o
8
1
Peter Bickel, lim P {A ~ t(8 ,a)} = a. Thus the test which rejects H
8
o
l
"
1
if A ~ t(8 ,a) would be asymptotically similar provided the conditions
l
assumed here hold.
e2
is a
finite set.
4.
A (8 2),
n
We shall cal-
A (02)
where
responding to a fixed
n = (n o ,n l ,n2 ).
-15-
T (b)
n
under
eO1
cor-
b
and
of
8
K
where
T (b)
is defined in (2.4),
Let
Z*
n
= Vn (n)
Then by (2.1),
- V (0).
n
N(-~nOlI(n2)n~1,nOlI(n2)n~1)
Z*
is asymptotically
first lemma on contiguity [cf. Hajek and Sidak (1967, p. 204)], this shows
K
n
is lIontiguous to
Since
and
T (b)
n
Z*
are asymptotically
is asymptotically
normal under
6~
under
o
6 .
Kn
Moreover,
= Tn(b)I{T (bO}+
and
(1)
K ,
under
K
n
is contiguous.
T (b)
n
iance unity.
A (b)
Under
and mean
Z*
n
and
T (b)
under
is
p= {I
where
00
(b)}
01
j
-~
[n I
o
and
00
(n 2 )Cov(U (b),U (n 2 )) +n
no
no
0
and
no
J=
1J
Un I'J
depends only on
P 01
L I. (n )Cov(U l"U
n J
no
and
(n 2 ))]
Note
By the
lim P
o
1
{A (bx} = 1- <P(/x)
n
-
=1
lim PK {An(b)":::'x}
n
if
x> 0
if
Then
=0
1 - <P(v'x- p (n ,T)2))
o
if
x> 0
if
x = 0
-16-
where
no >
~}
{ 'fin
of size
{~o}
n
a < .5.
under
is asymptotically
S({~n},e~,no,nl,n2)
and
~o
n
based on
inf
inf
n ER1 S({~n},e~.no,nl,b) ~ n ERl S({~~}.e~,nl,n2,b)
1
r
for every
no > 0 and
~o
if we show
p(no,b)
V n Z ~ b.
ge
= g(x.8) = A(e)exp{8x}h(x),
Let
Let
1/!(8)
11
ge
'Cov(g'
a
gb
--)
g
a
-17-
a < b
J,
be
be fixed elements of
J.
Ie =
a '
elo = a.
with
We assume
Lemma 4.1.
Proof.
~(e)
~(e) ~ ~(b)
if
e > b.
Note that
=0
~(a)
Also
J.
is finite on
~(e)
<
can be expressed as
g
g'
/{I 11 ~
- o
I l(b)
~ - I11}gedlJ
g
g
a
a
Since
(4.1)
~(b)
K,
say
sign changes and if there are two, they must be from positive to negative
and negative to positive.
~(e)
such that
~(8')
<
~(b)
If there exists
a,b,8'
provided we choose
-18-
K such that
5.
interval and
its closure.
< 8
< 1
and
182-8112:
H :
Suppose
H
o
g(x,6),
be a family of densities
Let
Let
0.
< 80
~~.
where
Without
In the sequel
We make the
Let
0 n O = 0.
1
2
that
8
61
8~
O2
and
0 ,0
1 2
(Since
< e
< 0
and
82 -< 80 - (-0)
n,Vn(n),An(n)
I81-8~1
< 0
and
82 > 8
-
(0 < E).
0
etc. as in Section 2.
L (8)
-19-
0(1)
and
A (n)
= (0,0).
Hence
o1
8 ,
2
may be
Sup L (8)
n
8
01
82
we may
uniformly in
L (8)
n
Hence
+ (-c)).
As in Section 3,
(1)
Va'
o<
E -
attained at
no'
n ol
62
2- 8 1 +
if
E -
Kn
Kn
-~
2- 8 2 2- 8 1
+ 0 or
-~
where
0(2) = {8
2
L (8)
n
under
L (8)
in Section 2 with
H
o2
A (n)
n
H:
The supremum of
= V (0) +
(1)
~ 8~ +d.
8 2 2- 8 1 -
under
H
a
0(2)
2
or
Assume Bl.
8~
..
o =
0(2)
22
ACKNOWLEDGEMENT
Thanks are due to the referee whose comments clarified many issues
and led to a better presentation.
REFERENCES
Bickel, P.J. and Wichura, M.J. (1971).
-20-
Ann. Math.
Chernoff, H. (1954).
Ann.
Bounds on moments
On mixtures of distributions:
Sankhya
43,
245-290.
Hajek. J. and Sidak. Z. (1967).
Academic Press,
New York.
Karlin. S. (1968).
Stanford, California.
McCormick. S.P. (1967).
Moran. P.A.P. (1973).
Nonlinear Programming.
Ann. Statist.,
~,
-21-
224-227 .