Statistical Models and Sufficiency in Parametric Inference

Lecture Notes 5
Statistical Models
(Chapter 6.) A statistical model P is a collection of probability distributions (or a collection of densities). An example of a nonparametric model is
(
)
Z
P=
(p00 (x))2 dx < .
p:
A parametric model has the form

(
P=
p(x; ) :
2 /2
where Rd . An example is the set of Normal densities {p(x; ) = (2)1/2 e(x)
}.
For now, we focus on parametric models. Later we consider nonparametric models.
Statistics
Let X1 , . . . , Xn p(x; ). Let X n (X1 , . . . , Xn ). Any function T = T (X1 , . . . , Xn ) is itself

a random variable which we will call a statistic.
Some examples are:
1.
2.
3.
4.
5.
6.
order statistics, X(1) X(2) X(n)

P
sample mean: X = n1 i Xi ,
P
1
2
sample variance: S 2 = n1
i (Xi x) ,
sample median: middle value of ordered statistics,
sample minimum: X(1)
sample maximum: X(n) .
Often, we are interested in the distribution of T .

Example 1 If X1 , . . . , Xn (, ), then X (n, /n).
Proof. The mgf is

P
MX = E[etx ] = E[e
Xi t/n
]=
E[eXi (t/n) ]

= [MX (t/n)] =
1
1 t/n
n
1
=
1 /nt
n
.
This is the mgf of (n, /n).

Example 2 If X1 , . . . , Xn N (, 2 ) then X N (, 2 /n).
Example 3 If X1 , . . . , Xn iid Cauchy(0,1),
p(x) =
1
(1 + x2 )
for x R, then X Cauchy(0,1).

Example 4 If X1 , . . . , Xn N (, 2 ) then
(n 1) 2
S 2(n1) .
2
The proof is based on the mgf.

Example 5 Let X(1) , X(2) , . . . , X(n) be the order statistics, which means that the sample
X1 , X2 , . . . , Xn has been ordered from smallest to largest:
X(1) X(2) X(n) .
Now,
FX(k) (x) = P (X(k) x)
= P (at least k of the X1 , . . . , Xn x)
n
X
=
P (exactly j of the X1 , . . . , Xn x)
=
j=k
n
X
j=k

n
[FX (x)]j [1 FX (x)]nj
j
Differentiate to find the pdf:

pX(k) (x) =
n!
[FX (x)]k1 p(x) [1 FX (x)]nk .
(k 1)!(n k)!
2
Sufficiency
We continue with parametric inference. In this section we discuss data reduction as a

formal concept.
3.1
Sufficient Statistics
Suppose that X1 , . . . , Xn p(x; ). T is sufficient for if the conditional distribution

of X1 , . . . , Xn |T does not depend on . Thus, p(x1 , . . . , xn |t; ) = p(x1 , . . . , xn |t).
Intuitively, this means that you can replace X1 , . . . , Xn with T (X1 , . . . , Xn ) without losing
information. (This is not quite true as well see later. But for now, you can think of it this
way.)
Example 6 X1 , , Xn Poisson(). Let T =
Pn
i=1
pX n |T (xn |t) = P(X n = xn |T (X n ) = t) =
Xi . Then,
P (X n = xn and T = t)
.
P (T = t)
But
n
P (X = x and T = t) =
0
T (x1 , . . . , xn ) 6= t
P (X1 = x1 , . . . , Xn = xn ) T (x1 , . . . , xn ) = t.
Hence,
n
P (X = x ) =
n
Y
e xi
i=1
Now, T (xn ) =
xi !
en xi
en t
Q
Q
=
=
.
(xi !)
(xi !)
xi = t and so
P (T = t) =
en (n)t
t!
since T Poisson(n).
Thus,
t!
P (X n = xn )
= Q
P (T = t)
( xi )!nt
P
which does not dependP
on . So T =
P i Xi is a sufficient statistic for . Other sufficient
statistics are: T = 3.7 i Xi , T = ( i Xi , X4 ), and T (X1 , . . . , Xn ) = (X1 , . . . , Xn ).
3.2
Sufficient Partitions
It is better to describe sufficiency in terms of partitions of the sample space.

Example 7 Let X1 , X2 , X3 Bernoulli(). Let T =
xn
(0,
(0,
(0,
(1,
(0,
(1,
(1,
(1,
0,
0,
1,
0,
1,
0,
1,
1,
0)
1)
0)
0)
1)
1)
0)
1)
8 elements
t
t=0
t=1
t=1
t=1
t=2
t=2
t=2
t=3
Xi .
p(x|t)
1
1/3
1/3
1/3
1/3
1/3
1/3
1
4 elements
1. A partition B1 , . . . , Bk is sufficient if f (x|X B) does not depend on .

2. A statistic T induces a partition. For each t, {x : T (x) = t} is one element of the
partition. T is sufficient if and only if the partition is sufficient.
P
P
3. Two statistics can generate the same partition: example: i Xi and 3 i Xi .
4. If we split any element Bi of a sufficient partition into smaller pieces, we get another
sufficient partition.
Example 8 Let X1 , X2 , X3 Bernoulli(). Then T = X1 is not sufficient. Look at its
partition:
xn
(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(0, 1, 1)
(1, 0, 0)
(1, 0, 1)
(1, 1, 0)
(1, 1, 1)
8 elements
t
p(x|t)
t=0
(1 )2
t=0
(1 )
t=0
(1 )
t=0
2
t=1
(1 )2
t=1
(1 )
t=1
(1 )
t=1
2
2 elements
4
3.3
The Factorization Theorem
Theorem 9 T (X n ) is sufficient for if the joint pdf/pmf of X n can be factored as

p(xn ; ) = h(xn ) g(t; ).
Example 10 Let X1 , , Xn Poisson. Then
P
P
en Xi
1
=Q
en i Xi .
p(xn ; ) = Q
(xi !)
(xi !)
Example 11 X1 , , Xn N (, 2 ). Then
n
p(x ; , ) =
1
2 2
n2
P

(xi x)2 + n(x )2
exp
.
2 2
(a) If known:
n
p(x ; ) =
1
2 2
n2
P

(xi x)2
exp
2 2
{z
}

exp
|
h(xn )

n(x )2
.
2 2
{z
}
g(T (xn )|)
Thus, X is sufficient for .

P
P
(b) If (, 2 ) unknown then T = (X, S 2 ) is sufficient. So is T = ( Xi , Xi2 ).
3.4
Minimal Sufficient Statistics (MSS)
We want the greatest reduction in dimension.

Example 12 X1 , , Xn N (0, 2 ). Some sufficient statistics are:
T (X1 , , Xn ) = (X1 , , Xn )
T (X1 , , Xn ) = (X12 , , Xn2 )
!
m
n
X
X
T (X1 , , Xn ) =
Xi2 ,
Xi2
i=1
T (X1 , , Xn ) =
Xi2 .
i=m+1
T is a Minimal Sufficient Statistic if the following two statements are true:

1. T is sufficient and
2. If U is any other sufficient statistic then T = g(U ) for some function g.
In other words, T generates the coarsest sufficient partition.
Suppose U is sufficient. Suppose T = H(U ) is also sufficient. T provides greater reduction
than U unless H is a 1 1 transformation, in which case T and U are equivalent.
Example 13 X N (0, 2 ).
2
X 2 , X 4 , eX .
X is sufficient. |X| is sufficient. |X| is MSS. So are
Example 14 Let X1 , X2 , X3 Bernoulli(). Let T =

xn
(0,
(0,
(0,
(1,
(0,
(1,
(1,
(1,
0,
0,
1,
0,
1,
0,
1,
1,
0)
1)
0)
0)
1)
1)
0)
1)
t
t=0
t=1
t=1
t=1
t=2
t=2
t=2
t=3
p(x|t)
1
1/3
1/3
1/3
1/3
1/3
1/3
1
Xi .
u
u=0
u=1
u=1
u=1
u = 73
u = 73
u = 91
u = 103
p(x|u)
1
1/3
1/3
1/3
1/2
1/2
1
1
Note that U and T are both sufficient but U is not minimal.
3.5
How to find a Minimal Sufficient Statistic
Theorem 15 Define
p(y n ; )
R(x , y ; ) =
.
p(xn ; )
Suppose that T has the following property:
n
R(xn , y n ; ) does not depend on if and only if T (y n ) = T (xn ).

Then T is a MSS.
6
Example 16 Y1 , , Yn iid Poisson ().

P
en yi p(y n ; )
yi xi
Q
Q
p(y n ; ) = Q
=
,
p(xn ; )
yi
yi !/ xi !
P
P
P
which is independent of iff
yi =
xi . This implies that T (Y n ) =
Yi is a minimal
sufficient statistic for .
The minimal sufficient statistic is not unique. But, the minimal sufficient partition is unique.
Example 17 Cauchy.
p(x; ) =
Then
1
.
(1 + (x )2 )
n
Q
{1 + (xi )2 }
p(y ; )
= i=1
.
n
Q
p(xn ; )
2
{1 + (yj ) }
n
j=1
The ratio is a constant function of if

T (Y n ) = (Y(1) , , Y(n) ).
It is technically harder to show that this is true only if T is the order statistics, but it could
be done using theorems about polynomials. Having shown this, one can conclude that the
order statistics are the minimal sufficient statistics for .
What Sufficiency Really Means
If T is sufficient, then T contains all the information you need from the data to compute the
likelihood function. It does not contain all the information in the data. We will define
the likelihood function in the next set of notes.

Statistical Models and Sufficiency in Parametric Inference

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistical Models and Sufficiency in Parametric Inference

Diunggah oleh

Hak Cipta:

Format Tersedia

Lecture Notes 5

(p00 (x))2 dx < .

A parametric model has the form

where Rd . An example is the set of Normal densities {p(x; ) = (2)1/2 e(x)

For now, we focus on parametric models. Later we consider nonparametric models.

Let X1 , . . . , Xn p(x; ). Let X n (X1 , . . . , Xn ). Any function T = T (X1 , . . . , Xn ) is itself

order statistics, X(1) X(2) X(n)

Often, we are interested in the distribution of T .

Proof. The mgf is

This is the mgf of (n, /n).

for x R, then X Cauchy(0,1).

The proof is based on the mgf.

Differentiate to find the pdf:

We continue with parametric inference. In this section we discuss data reduction as a

Suppose that X1 , . . . , Xn p(x; ). T is sufficient for if the conditional distribution

Example 6 X1 , , Xn Poisson(). Let T =

pX n |T (xn |t) = P(X n = xn |T (X n ) = t) =

It is better to describe sufficiency in terms of partitions of the sample space.

1. A partition B1 , . . . , Bk is sufficient if f (x|X B) does not depend on .

The Factorization Theorem

Theorem 9 T (X n ) is sufficient for if the joint pdf/pmf of X n can be factored as

g(T (xn )|)

Thus, X is sufficient for .

Minimal Sufficient Statistics (MSS)

We want the greatest reduction in dimension.

T is a Minimal Sufficient Statistic if the following two statements are true:

X is sufficient. |X| is sufficient. |X| is MSS. So are

Example 14 Let X1 , X2 , X3 Bernoulli(). Let T =

Note that U and T are both sufficient but U is not minimal.

How to find a Minimal Sufficient Statistic

R(xn , y n ; ) does not depend on if and only if T (y n ) = T (xn ).

Example 16 Y1 , , Yn iid Poisson ().

The ratio is a constant function of if

What Sufficiency Really Means

Anda mungkin juga menyukai