David Gamarnik
LECTURE 2 Random variables and measurable functions. Strong Law of Large Numbers (SLLN). Scary stu continued ...
Outline of Lecture
Random variables and measurable functions. Extension Theorem. Borel-Cantelli Lemma and SLLN
D. GAMARNIK, 15.070
set A , A = . We have A is measurable with respect to F2 , but X 1 (A) = A is not measurable with respect to F1 . (b) (Figure.) Say = [0, 1]2 and X : R is dened by X ( ) = 1 + 2 . We claim that X is a random variable when is equipped with Borel -eld. Here is the proof. For every real value x consider the set A = { = (1 , 2 ) : 1 + 2 > x}. We will prove that A is measurable (belongs to the Borel -eld of [0, 1]2 ). Then we will take a complement of A and this will prove that X is random variable. Consider the countable set of pairs of rationals (r1 , r2 ) such that r1 + r2 > x. For each of them nd n = n(r1 , r2 ) the smallest integer which is large enough so that the recangle B (r1 , r2 ; 1/n) = {(1 , 2 ) [0, 1]2 : |1 r1 | 1 1 , |2 r2 | } n n
lies entirely in A (this is possible by strict inequality r1 + r2 > x). Observe that every pair (1 , 2 ) satisfying 1 + 2 > x lies in one of these rectangles. Thus A is the union 1 r1 ,r2 B (r1 , r2 , n(r1 ) of the countable collection of such rectangles and therefore belongs ,r2 ) to the Borel -eld of [0, 1]2 . (c) Say = C [0, ) equipped with the Borel -eld, and X : R is the function which maps every continuous function f (t) into max0t1 |f (t)|. Then X is a random variable on . Indeed, for every x, we have X 1 (x) is the set of all functions f such that max0t1 |f (t)| x. But this is exactly the set B (0, x, 1) used in Denition 1.5 of Lecture 1. The sets of this type generate the Borel -eld, and in particular, belong to it. Thus X is measurable. The concept of random variables naturally leads to the concept of probability distribution Denition 1.2. (Figure.) Given a probability space (, F , P) and a random variables X : R, the associated probability distribution is dened to be the function F : R [0, 1] given by F (x) = P({ : X ( ) x}). When F (x) is a dierentiable function of x, its derivative (x) is called the density function. f (x) = F In other words F (x) is the probability given to the set of all elementary outcomes which are mapped by X into value at most x. It is the probability distributions which are usually discussed in elementary probability classes. There, one usually denes probability distribution as a function satisfying certain properties (like it should be non-decreasing and should converge to unity as x ). Here these properties can be derived from the given denition of a probability distribution. Proposition 1. Prove that F (x) is non-decreasing, non-negative and limx F (x) = 0, limx F (x) = 1. Proof. HW
The concept of probability distributions allows one to perform the probability related calculations without alluding to more abstract notions of probability measures. This is not possible, however, when we discuss probability spaces like C [0, ).
Having dened random variables and associated probability distributions, we can dene further expected values, moments, moment generating functions, etc., in a more formal way then is done in elementary probability classes. We do this only heuristically, highlighting the main ideas. Denition 1.3. A random variable X : R is called simple if it takes only nitely many values x1 , x2 , . . . , xm . The expected value of a simple random variable X is dened to be the quantity E[X ] = xi P{ : X ( ) = xi }.
1im
What if X is not simple? How do we dene its expected value? The idea is to approximate X by a sequence of simple random variables. For simplicity assume that X takes only values k in the interval [0, A] for some A > 0. That is X : [0, A]. Now consider Xn ( ) = n if k 1 k X ( ) ( n , n ]. Then Xn is a simple random variable. It can be shown that the sequence of the corresponding expected values E[ Xn ] converges. Its limit is called the expected value E[X ] of X . It is also sometimes written as X ( )dP( ). This denition of expected value satises all the properties of expected values one studies in elementary probability courses, for example the fact E[X 2 ] (E[X ])2 , Markov inequality, Chebyshev inequality, Jensens inequality, etc.
D. GAMARNIK, 15.070
some collection of sets A and we can dene P on A only, the there is a unique extension of the function P onto entire -eld, provided some restrictions are satised. 1.2.1. Extension Theorem Theorem 1.5 ( Extension Theorem). Given a sample space and a collection A of subsets of such that for every A A its complement \ A is also in and for every nite sequence A1 , . . . , Am its union 1j m Aj is also in A. Suppose P : A [0, 1] is such that (a) P() = 1, (b) P( j =1 Aj ) j =1 P(Aj ), whenever j =1 Aj A. (c) P( j =1 Aj ) = j =1 P(Aj ), whenever j =1 Aj A and Ai , i = 1, 2, . . . are mutually exclusive. Then the function P uniquely extends to a probability measure P : F (A) [0, 1] dened on the -eld generated by A. Remark. Note, that the requirement from A is to be a collection of sets with properties very similar to that of a -eld. The only dierence is that we do not require every innite union of sets to be in A as well. 1.2.2. Examples and applications Uniform probability measure. Consider = [0, 1] and let A be the set of nite unions of open or closed non-intersecting intervals: [a1 , b1 ) [a2 , b2 ] (am , bm ). It is easy to check that A satises the conditions of the ET. Consider the function P : A [0, 1] which maps every such set of intervals to the value 1im bi ai (that is the total length of these intervals). It can be checked that this also satises the conditions of the ET (we skip the proof). Thus, by ET, there exists a unique extension of function P to a probability measure on entire Borel -eld, since this -eld is generated by intervals. This probability measure is called uniform probability measure on [0, 1]. Other types of continuous distributions. What about other distributions like Normal, Exponential, etc.? The proper denition of these probability measures is introduced similarly. For example the standard Normal distribution is dened as probability space (R, B , P), where 2 b t2 B is the Borel -eld on R and P assigns to each interval [a, b] value a 1 e dt. Then each 2 non-intersecting collection of intervals [ai , bi ], 1 i m is assigned value which is the sum of the corresponding integrals. Again the set of nite collections of non-intersecting intervals satises the conditions of ET, and applying ET we obtained that the probability measure P is dened on the entire Borel -eld B . 1.2.3. i.i.d. sequences i.i.d coin tosses . Let = {0, 1} . Recall that the product -eld is the eld generated by cylinder type sets A( ). Let A be the set of nite unions of such sets 1j k A(j ). Again, it can be checked that that A satises the conditions of ET. For every nite sequence = 1 , . . . , m and the corresponding set A( ) we set P(A( )) simply to be 21 m (the probability of a particular 1 sequence of 0/1 observations in the rst m coin tosses is 2m ). For example, the probability of rst four zeros is 214 . Then, for every union of non-intersecting sets 1j k A(j ) we set their
corresponding value to 2k m . The conditions of ET again can be checked, but we skip the proof. Then, by ET there is a unique extension of P to the entire product -eld of . This is what we call a sequence of i.i.d. unbiased coin tosses also known as a sequence of i.i.d. Bernoulli random variables with parameter 1/2. The phrase i.i.d., in proper probabilistic terms, means (, F , P) the probability space constructed above. General i.i.d. type distributions. We have dened formally i.i.d. Bernoulli sequence. What about general i.i.d. sequences? They are dened similarly by considering innite products and cylinder type sets. First we set = R . On it we consider the product -eld F . Dene A to be the set of nite unions of cylinder type sets. Recall that a cylinder set A is the set of the form A = [a1 , b1 ] (a2 , b2 ) [am , bm ) R product of closed or open or half-closed half open intervals. Recall also that cylinder sets generate, by denition, the product -eld F . Suppose we have a probability space (R, B , P) dened on R and its Borel -eld B (for exam ple P corresponds to standard Normal distribution). Then for every cylinder set A we dene P(A) = 1j m P([aj , bj ]). Again we check that A and P satisfy the conditions of ET (we skip the proof). Thus there is a unique extension of P to the entire product -eld F of R , since A generates this -eld. Then we dene Xm ( ) = m for every R . We note that Xm is a random variable as it is a measurable function from R into R. The sequence X1 , X2 , . . . is a stochastic process which we call an i.i.d. sequence of random variables. Essentially we have embedded a sequence of random variables {Xm } into a single probability space (R , F , P). Is this denition consistent with elementary denition of i.i.d. Recall that elementary denition of i.i.d. sequence is when P(X1 x1 , . . . , Xm xm ) = 1j m P(Xj xj ). Is this true in our case? Note P(X1 x1 , . . . , Xn xm ) = P{ R : 1 (, x1 ], . . . , m (, xm ]} = P{ (, x1 ] (, xm ] R } = P((, xj ]),
1j m
where the last equality follows from how we dened P on cylinder sets. But the product of these probabilities is exactly 1j n P(Xj xj ). Thus the identity checks.
D. GAMARNIK, 15.070
the validity of this identity) Ai.o. = m1 j m Aj . Lemma 1.6 (Borel-Cantelli Lemma ). Given a probability space (, F , P) and an innite sequence of events Am , m 1 suppose m P(Am ) < . Then P(Ai.o. ) = 0. In words we say: the probability that Am happen innitely often is equal to zero. Proof. Dene Bm = j m Aj . Then Ai.o. = m Bm . Then B1 B2 B3 . Using Proposition 4 part (b) (applied to complement sets) we obtain P(Ai.o. ) = limm P(Bm ). But since P ( A ) < then the tail parts of the sum satisfy lim m m m j m P(Aj ) = 0. But P(Bm ) = P(j m Aj ) j m P(Aj ). Therefore, moreover limm Bm = 0. We conclude P(Ai.o. ) = 0. Theorem 1.7 (SLLN). Consider an i.i.d sequence of random variables Xn , n = 1, 2, . . . corre sponding to some probability measure (R, B , P). Suppose E[X1 ] < . Then almost surely 1in Xi lim = E[X1 ]. n n Formally, dene 1in Xi ( ) A = { R : lim = E[X1 ]}. n n Then P(A) = 1. Proof. The proof of this fundamental result in probability theory is complicated (see for example [2]). Here, for simplicity, we consider a special case when the random variable X1 has a nite fourth moment. Namely, 4 E[|X1 | ] = |X1 ( )|4 dP( ) < . Let us center the random variables Xi in the following way: Yi = Xi E[Xi ]. Then Yi have zero expected value. Since Y i 1in 1in Xi = E[Xi ], n n
P
1in
Yi
converges almost surely to zero. Fix > 0 and dene the event
> . Formally An () = { R :|
1in
Yi ( )
n
1in
| > }.
1in
Yi
Yi
n
1in n4 4
)4 > 4 )
Yi )4 ]
When we expand E[( 1in Yi )4 ] we note that only the terms of the form E[Yi4 ] and E[Yi2 Yj2 ] are non-zero, since the expected value of Yi is zero and the sequence is i.i.d. Also by independence
E[Yi2 Yj2 ] = (E[Y12 ])2 . We obtain a bound nE[Y14 ] + n(n 1)(E[Y12 ])2 n2 [E[Y14 ] + (E[Y12 ])2 ] E[Y14 ] + (E[Y12 ])2 = n4 4 n2 4 n4 4 This expression is nite by our assumption of niteness of fourth moment. Since the sum 4 ]+(E[Y 2 ])2 E[Y1 1 < , then applying the Borel-Cantelli Lemma we conclude that probabil n1 n2 4 ity that An () happens for innitely many n is zero. In other words, P for almost every R n Yi ( ) there exists n0 = n0 (, ) such that for all n > n0 we must have | 1i | . This means n that for almost every , we have 1in Yi ( ) lim = 0. n n This concludes the proof.
BIBLIOGRAPHY
1. R. Durrett, Probability: theory and examples, Duxbury Press, second edition, 1996. 2. G. R. Grimmett and D. R. Stirzaker, Probability and random processes, Oxford Science Pub lications, 1985.