Dy e Dy y P B y A P: Plot of Gaussian PDF

Lecture 3
The Gaussian Probability Distribution Function

Introduction
l
The Gaussian probability distribution is perhaps the most used distribution in all of science.
Sometimes it is called the bell shaped curve or normal distribution.
Unlike the binomial and Poisson distribution, the Gaussian is a continuous distribution:
p( y)
1
e
2
( y )2
2 2
= mean of distribution (also at the same place as mode and median)

2 = variance of distribution
y is a continuous variable (- y
Probability (P) of y being in the range [a, b] is given by an integral:
1 b
P(a y b) p( y )dy
e
2 a
a
b
( y )2
2 2
dy
Karl Friedrich Gauss 1777-1855

The integral for arbitrary a and b cannot be evaluated analytically.
The value of the integral has to be looked up in a table (e.g. Appendixes A and B of Taylor).
Plot of Gaussian pdf
1
p(x)
e
2
p(x)
R. Kass/S06
P416 Lec 3
(x )2
2
2
gaussian
The integrals with

limits [-, ] can be
evaluated, see Barlow P. 37.
The total area under the curve is normalized to one by the (2) factor.

(y) 2
2 2 dy 1
1
e
2
l We often talk about a measurement being a certain number of standard deviations ( ) away
from the mean () of the Gaussian.
We can associate a probability for a measurement to be | - n|
from the mean just by calculating the area outside of this region.
n Prob. of exceeding n
0.67
0.5
It is very unlikely (< 0.3%) that a
1
0.32
measurement taken at random from a
2
0.05
Gaussian pdf will be more than 3
3
0.003
from the true mean of the distribution.
4
0.00006
P( y )
Shaded
area
gives
0.4
prob .
-3
-2
-1
gives
0.4
0.2
0.1
0.1
-4
-3
prob .
0.3
0.2
95% of area within 2

R. Kass/S06
area
Gaussian with =0 and =1
0.3
-4
Shaded
-2
-1
Only 5% of area outside 2

P416 Lec 3
Relationship between Gaussian and Binomial distribution

The Gaussian distribution can be derived from the binomial (or Poisson) assuming:
u
p is finite
u
N is very large
u
we have a continuous variable rather than a discrete variable
l An example illustrating the small difference between the two distributions under the above conditions:
u
Consider tossing a coin 10,000 times.
p(head) = 0.5
N = 10,000
l
For a binomial distribution:

mean number of heads = = Np = 5000
standard deviation = [Np(1 - p)]1/2 = 50
The probability to be within 1 for this binomial distribution is:
500050
10 4 !
m
104 m
P
0.5
0.5
0.69
4
m500050 (10 m)!m!
For a Gaussian distribution:
( y) 2
2
1
P( y )
e 2 dy 0.68
2
See Taylor10.4
Both distributions give about the same probability!
R. Kass/S06
P416 Lec 3
Why is the Gaussian pdf so applicable? Central Limit Theorem

A crude statement of the Central Limit Theorem:
Things that are the result of the addition of lots of small effects tend to become Gaussian.
A more exact statement:
Let Y1, Y2,...Yn be an infinite sequence of independent random variables
each with the same probability distribution.
Suppose that the mean () and variance (2) of this distribution are both finite.
For any numbers a and b:
Y1 Y2 ...Yn n
1 b 12 y 2
lim Pa
b
dy
e
n
n
2 a
Actually, the Ys can

be from different pdfs!
The C.L.T. tells us that under a wide range of circumstances the

probability distribution that describes the sum of random variables
tends towards a Gaussian distribution as the number of terms in the sum .
How close to does n have to be??
Alternatively we can write the CLT in a different form:
Y
Y
1 b 12 y 2
lim Pa
b lim Pa
b
dy
e
n
/ n n
m
2
a
R. Kass/S06
P416 Lec 3
Y
1 b 12 y 2
lim Pa
b lim Pa
b
dy
e
n
/ n n
m
2 a
m is sometimes called the error in the mean (more on that later):
FFor CLT to be valid:
The and of the pdf must be finite.

No one term in sum should dominate the sum.
l A random variable is not the same as a random number.
A random variable is any rule that associates a number with each outcome in S
(Devore, in probability and Statistics for Engineering and the Sciences).
Here S is the set of possible outcomes.
l Recall if y is described by a Gaussian pdf with mean () of zero and =1 then the
probability that a<y<b is given by:
1 b 12 y 2
P ( a y b)
dy
e
2 a
The CLT is true even if the Ys are from different pdfs as long as the means
and variances are defined for each pdf !
See Appendix of Barlow for a proof of the Central Limit Theorem.
R. Kass/S06
P416 Lec 3
Example: Generate a Gaussian distribution using uniform random numbers.

u
Random number generator gives numbers distributed uniformly in the interval [0,1]
n
= 1/2 and 2 = 1/12
Procedure:
a) Take 12 numbers (r1, r2,r12) from your computers random number generator (ran(iseed))
b) Add them together
c) Subtract 6
Get a number that looks as if it is from a Gaussian pdf!

Y Y2 ...Yn n
P a
b
n
12
1
r
12
2
i1
P a
b
1
12
12
12
P6 ri 6 6
i1
A) 5000 random numbers
C) 5000 triplets (r1 + r2 + r3)

of random numbers
1 6 12 y 2
dy
e
2 6
D) 5000 12-plets (r1 + r2 +r12)

of random numbers.
E) 5000 12-plets
E
(r1 + r2 +r12 - 6) of
random numbers.
Gaussian
= 0 and = 1
Thus the sum of 12 uniform random

numbers minus 6 is distributed as if it came
from
a Gaussian pdf with = 0 and = 1.
R. Kass/S06
B) 5000 pairs (r1 + r2)

of random numbers
-6
+6
12 is close to
P416 Lec 3
Example:
A watch makes an error of at most 1/2 minute per day.

After one year, whats the probability that the watch is accurate to within 25 minutes?
u
Assume that the daily errors are uniform in [-1/2, 1/2].

n For each day, the average error is zero and the standard deviation 1/12 minutes.
n The error over the course of a year is just the addition of the daily error.
n Since the daily errors come from a uniform distribution with a well defined mean and variance
the Central Limit Theorem is applicable:
Y Y ...Yn n
1 b 12 y 2
lim Pa 1 2
b
dy
e
n
n
2 a
The upper limit corresponds to +25 minutes:
Y1 Y2 ...Yn n 25 365 0
4.5
1
n
365
12
The lower limit corresponds to 25 minutes.

Y Y ...Yn n 25 365 0
a 1 2
4.5
1
n
365
12
The probability to be within 25 minutes is:
This integral is 1
to about 3 part in 106!
1 4.5 12 y 2
P
dy 0.999997
e
2 4.5
The probability to be off by more than 25 minutes is just: 1-P10.999997310-6

There is < 3 in a million chance that the watch will be off by more than 25 minutes in a year!
R. Kass/S06
P416 Lec 3
l Example: The daily income of a card shark has a uniform distribution in the interval [-$40,$50].
u
What is the probability that s/he wins more than $500 in 60 days?
Lets use the CLT to estimate this probability:
Y1 Y2 ...Yn n
1 b 12 y 2
lim Pa
b
dy
e
n
n
2 a
The probability distribution of daily income is uniform, p(y) = 1.
p(y) needs to be normalized in computing the average daily winning () and its standard deviation ().
50
yp(y)dy
40
50
p(y)dy
1 [50 2
2
(40)2 ]
50 (40)
40
50
2
2
y p(y)dy
4050
p(y)dy
1 [50 3 (40) 3 ]
3
25 675
50 (40)
40
The lower limit of the winnings is $500:

Y Y ...Yn n 500 60 5 200
a 1 2
1
n
675 60
201
The upper limit is the maximum that the shark could win (50$/day for 60 days):
Y Y ...Yn n 3000 60 5 2700
b 1 2
13.4
n
675 60
201
1 13.4 12 y 2
1 12 y 2
P
e
dy
dy 0.16
e
2 1
2 1
16% chance to win $500 in 60 days
R. Kass/S06
P416 Lec 3

Dy e Dy y P B y A P: Plot of Gaussian PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Dy e Dy y P B y A P: Plot of Gaussian PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Lecture 3

The Gaussian Probability Distribution Function

= mean of distribution (also at the same place as mode and median)

Karl Friedrich Gauss 1777-1855

Plot of Gaussian pdf

The integrals with

95% of area within 2

Gaussian with =0 and =1

Only 5% of area outside 2

Relationship between Gaussian and Binomial distribution

For a binomial distribution:

Both distributions give about the same probability!

Why is the Gaussian pdf so applicable? Central Limit Theorem

Actually, the Ys can

The C.L.T. tells us that under a wide range of circumstances the

m is sometimes called the error in the mean (more on that later):

FFor CLT to be valid:

The and of the pdf must be finite.

Example: Generate a Gaussian distribution using uniform random numbers.

Get a number that looks as if it is from a Gaussian pdf!

A) 5000 random numbers

C) 5000 triplets (r1 + r2 + r3)

D) 5000 12-plets (r1 + r2 +r12)

Thus the sum of 12 uniform random

B) 5000 pairs (r1 + r2)

A watch makes an error of at most 1/2 minute per day.

Assume that the daily errors are uniform in [-1/2, 1/2].

The upper limit corresponds to +25 minutes:

The lower limit corresponds to 25 minutes.

The probability to be off by more than 25 minutes is just: 1-P10.999997310-6

The lower limit of the winnings is $500:

16% chance to win $500 in 60 days

Anda mungkin juga menyukai