Anda di halaman 1dari 26

BST 401 Probability Theory

Xing Qiu Ha Youn Lee

Department of Biostatistics and Computational Biology


University of Rochester

October 15, 2009

Qiu, Lee BST 401


Outline

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (I)

R
If Y = φ(X ), then EY = Ω φ(x)dP.
Special case: φ(X ) = X k . EX k is called the k th moment of
X.
E|X |k : the k th absolute moment of X (remember the Lp
norm?).
E(X − EX )k : the k th central moment of X .
E|X − EX |k : the k th absolute central moment of X .
The second central moment (which is also the second
absolute central moment) of X is called the variance of X .
Denote m = EX .
var(X ) = E(X − m)2 = E(X 2 − 2mX + m2 ) = E(X 2 ) − m2 .

Qiu, Lee BST 401


Expectation (II)

The existing of a higher order moment implies the existing


of a lower order moment.
A special case of Chebyshev’s inequality: (m = EX ,
σ=STD)
1
P(|X − m| > k σ) ≤ 2
k

Qiu, Lee BST 401


Expectation (II)

The existing of a higher order moment implies the existing


of a lower order moment.
A special case of Chebyshev’s inequality: (m = EX ,
σ=STD)
1
P(|X − m| > k σ) ≤ 2
k

Qiu, Lee BST 401


Covariance/correlation

cov(X , Y ) = E(X − EX )(Y − EY ).


corr(X , Y ) = cov(X , Y )/σX σY .
Why correlation is always in [−1, 1]? Cauchy-Schwarz.
Variance of summation.P P
var(X1 + X2 + . . .) = var Xi + 2 i,j=1;i<j cov(X , Y ).

Qiu, Lee BST 401


Covariance/correlation

cov(X , Y ) = E(X − EX )(Y − EY ).


corr(X , Y ) = cov(X , Y )/σX σY .
Why correlation is always in [−1, 1]? Cauchy-Schwarz.
Variance of summation.P P
var(X1 + X2 + . . .) = var Xi + 2 i,j=1;i<j cov(X , Y ).

Qiu, Lee BST 401


Covariance/correlation

cov(X , Y ) = E(X − EX )(Y − EY ).


corr(X , Y ) = cov(X , Y )/σX σY .
Why correlation is always in [−1, 1]? Cauchy-Schwarz.
Variance of summation.P P
var(X1 + X2 + . . .) = var Xi + 2 i,j=1;i<j cov(X , Y ).

Qiu, Lee BST 401


Independence (I))

Two events A, B are independent if P(A ∩ B) = P(A)P(B).


Two random variables X , Y are independence if for any
A, B ∈ B, P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
Two σ-algebras F and G are independent if all events in
each algebra are independent.
The second definition is a special case of 3 because X
induces a σ-algebra FX = X −1 (B) on Ω. Note that
FX ⊆ F (coarser).
For more than one variables: X1 , X2 , . . .. We have to make
sure for all B1 , B2 , . . . ∈ B, X −1 (B1 ), X −1 (B2 ), . . . are
independent.

Qiu, Lee BST 401


Independence (I))

Two events A, B are independent if P(A ∩ B) = P(A)P(B).


Two random variables X , Y are independence if for any
A, B ∈ B, P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
Two σ-algebras F and G are independent if all events in
each algebra are independent.
The second definition is a special case of 3 because X
induces a σ-algebra FX = X −1 (B) on Ω. Note that
FX ⊆ F (coarser).
For more than one variables: X1 , X2 , . . .. We have to make
sure for all B1 , B2 , . . . ∈ B, X −1 (B1 ), X −1 (B2 ), . . . are
independent.

Qiu, Lee BST 401


Independence (I))

Two events A, B are independent if P(A ∩ B) = P(A)P(B).


Two random variables X , Y are independence if for any
A, B ∈ B, P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
Two σ-algebras F and G are independent if all events in
each algebra are independent.
The second definition is a special case of 3 because X
induces a σ-algebra FX = X −1 (B) on Ω. Note that
FX ⊆ F (coarser).
For more than one variables: X1 , X2 , . . .. We have to make
sure for all B1 , B2 , . . . ∈ B, X −1 (B1 ), X −1 (B2 ), . . . are
independent.

Qiu, Lee BST 401


Independence (I))

Two events A, B are independent if P(A ∩ B) = P(A)P(B).


Two random variables X , Y are independence if for any
A, B ∈ B, P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
Two σ-algebras F and G are independent if all events in
each algebra are independent.
The second definition is a special case of 3 because X
induces a σ-algebra FX = X −1 (B) on Ω. Note that
FX ⊆ F (coarser).
For more than one variables: X1 , X2 , . . .. We have to make
sure for all B1 , B2 , . . . ∈ B, X −1 (B1 ), X −1 (B2 ), . . . are
independent.

Qiu, Lee BST 401


Independence (I))

Two events A, B are independent if P(A ∩ B) = P(A)P(B).


Two random variables X , Y are independence if for any
A, B ∈ B, P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
Two σ-algebras F and G are independent if all events in
each algebra are independent.
The second definition is a special case of 3 because X
induces a σ-algebra FX = X −1 (B) on Ω. Note that
FX ⊆ F (coarser).
For more than one variables: X1 , X2 , . . .. We have to make
sure for all B1 , B2 , . . . ∈ B, X −1 (B1 ), X −1 (B2 ), . . . are
independent.

Qiu, Lee BST 401


(Independence (II))

Distribution function and independence: X1 , X2 , . . . are


independent iff F (x1 , x2 , . . . ) = F (x1 )F (x2 ) . . ..
The ⇒ part is just a special case of the definition. The ⇐
part relies on the Carathéodory extension theorem.
In terms of density function: the same multiplication rule.
If X1 , X2 , . . . are independent, then f1 (X1 ), f2 (X2 ), . . . are
independent. This is because fi (X ) can only induce a
σ-algebra that is coarser than FX . Corollary (4.5) is a
slightly stronger version of this proposition.

Qiu, Lee BST 401


(Independence (II))

Distribution function and independence: X1 , X2 , . . . are


independent iff F (x1 , x2 , . . . ) = F (x1 )F (x2 ) . . ..
The ⇒ part is just a special case of the definition. The ⇐
part relies on the Carathéodory extension theorem.
In terms of density function: the same multiplication rule.
If X1 , X2 , . . . are independent, then f1 (X1 ), f2 (X2 ), . . . are
independent. This is because fi (X ) can only induce a
σ-algebra that is coarser than FX . Corollary (4.5) is a
slightly stronger version of this proposition.

Qiu, Lee BST 401


(Independence (II))

Distribution function and independence: X1 , X2 , . . . are


independent iff F (x1 , x2 , . . . ) = F (x1 )F (x2 ) . . ..
The ⇒ part is just a special case of the definition. The ⇐
part relies on the Carathéodory extension theorem.
In terms of density function: the same multiplication rule.
If X1 , X2 , . . . are independent, then f1 (X1 ), f2 (X2 ), . . . are
independent. This is because fi (X ) can only induce a
σ-algebra that is coarser than FX . Corollary (4.5) is a
slightly stronger version of this proposition.

Qiu, Lee BST 401


(Independence (II))

Distribution function and independence: X1 , X2 , . . . are


independent iff F (x1 , x2 , . . . ) = F (x1 )F (x2 ) . . ..
The ⇒ part is just a special case of the definition. The ⇐
part relies on the Carathéodory extension theorem.
In terms of density function: the same multiplication rule.
If X1 , X2 , . . . are independent, then f1 (X1 ), f2 (X2 ), . . . are
independent. This is because fi (X ) can only induce a
σ-algebra that is coarser than FX . Corollary (4.5) is a
slightly stronger version of this proposition.

Qiu, Lee BST 401


Independence and Expectation

Being independent implies being uncorrelated


EXY = EX · EY but not vice versa.
This is because covariance is a summary statistic which is
sort of the “average dependence” and being dependent
means independence “almost everywhere”. Example: X ,
Y are two discrete random variables. X ⊥ Y means
P(X = t, Y = s) = P(X = t)P(Y = s) for every t, s, so
there are many, many constraint equations. On the other
hand,
P EXY = EXEYP Pis just one constraint equation
t X (t)Y (t) = t s Xt Ys .
The convolution formula for the sum of two independent
r.v.s.

Qiu, Lee BST 401


Independence and Expectation

Being independent implies being uncorrelated


EXY = EX · EY but not vice versa.
This is because covariance is a summary statistic which is
sort of the “average dependence” and being dependent
means independence “almost everywhere”. Example: X ,
Y are two discrete random variables. X ⊥ Y means
P(X = t, Y = s) = P(X = t)P(Y = s) for every t, s, so
there are many, many constraint equations. On the other
hand,
P EXY = EXEYP Pis just one constraint equation
t X (t)Y (t) = t s Xt Ys .
The convolution formula for the sum of two independent
r.v.s.

Qiu, Lee BST 401


Independence and Expectation

Being independent implies being uncorrelated


EXY = EX · EY but not vice versa.
This is because covariance is a summary statistic which is
sort of the “average dependence” and being dependent
means independence “almost everywhere”. Example: X ,
Y are two discrete random variables. X ⊥ Y means
P(X = t, Y = s) = P(X = t)P(Y = s) for every t, s, so
there are many, many constraint equations. On the other
hand,
P EXY = EXEYP Pis just one constraint equation
t X (t)Y (t) = t s Xt Ys .
The convolution formula for the sum of two independent
r.v.s.

Qiu, Lee BST 401

Anda mungkin juga menyukai