Lecture Stats

© All Rights Reserved

11 tayangan

Lecture Stats

© All Rights Reserved

- Study Guide 01
- ct32005-2010
- Cpk Index - How to Calculate for All Types of Tolerances
- An Introductory Guide to Shazam
- MAT 540 Statistical Concepts for Research
- Descrete Distribution
- Index Number
- Measures of Central Tendency and Variability
- Statistic Frequency Distribution
- Serial 13_trial MRSM 2006(P1)
- Measures of Central Tendency
- chapter6-exerciesextra
- 8. Probability Distributions.doc
- Sampling Distribution
- Review of Probability and Statistics1
- Normal Distribution.ppt
- Taleb Violent Conflicts
- the-normal-distribution.docx
- Add Math Final Yr
- Esclerometria

Anda di halaman 1dari 42

and Economics

Module 1:Probability Theory and

Statistical Inference

Spring 2010

Lecture 3: Continuous probability distributions

Priyantha Wijayatunga, Department of Statistics, Ume

University

These materials

are altered ones from copyrighted lecture slides ( 2009 W.H.

priyantha.wijayatunga@stat.umu.se

Freeman and Company) from the homepage of the book:

The Practice of Business Statistics Using Data for Decisions :Second Edition

by Moore, McCabe, Duckworth and Alwan.

Continuous probability

distributions

Probability density

Sampling distributions

Distributions

Let X denote the # of days a student comes to class (in a week).

Probability distibution is

0.1

0.2

P X x p ( x) 0.2

0.3

0.2

if x 1

if x 2

if x 3

if x 4

if x 5

then

1)what is the probability that a student comes to the class more than 3 days?

2)what is the probability that a student comes to the class 2 or 3 days?

Continuous Probability

A

continuous random variable X takes all values in an interval.

Distributions

Example: There is an infinity of numbers between 0 and 1 (e.g., 0.001, 0.4, 0.0063876).

by a density curve ( also called density function or probability

density).

The probability of any event is the area under the density curve for the

values of X that make up the event.

This is a uniform density curve for the variable X.

The probability that X falls between 0.3 and 0.7 is

the area under the density curve for that interval:

P(0.3 X 0.7) = (0.7 0.3)*1 = 0.4

Density function:

X

f(x)= 1; for 0 x 1

f(x)= 0; for x<0 or x>1

Intervals

All continuous probability distributions assign probability 0 to every

individual outcome. Only intervals can have a positive probability, represented

by the area under the density curve for that interval.

P(X=1) = (1 1)*1 = 0

Height

=1

boundary values are included or excluded:

P(0 X 0.5) = (0.5 0)*1 = 0.5

P(0 < X < 0.5) = (0.5 0)*1 = 0.5

P(X < 0.5 or X > 0.8) = P(X < 0.5) + P(X > 0.8) = 1 P(0.5 < X < 0.8) = 0.7

outcomes

curve.

If

all possible outcomes are equally likely: for example, obtaining a

outcomes

value from 0 to 1 is equally likely.

P(0.3 X 0.7) = 0.4

Similarly, P(X < 0.5 or X > 0.8) = 0.5 +0.2 = 0.7

If

the outcomes are equally likely for any value in between two numbers a and b

distribution

(random variable X can take any value in between a and b) where a<b,

f (x)

(b - a)

if a x b

otherwise

takes to solve a math problem is

known to be any number in between

10 to 20 with equal chances.

Find the probability that a student

takes more than 6 but less than 12

minutes to solve a given math problem.

distribution

The shaded area under a density

curve shows the proportion, or %,

of individuals in a population with

values of X between x1 and x2.

one individual at random

depends on the frequency of this

type of individual in the population,

the probability is also the shaded

area under the curve.

% individuals with X

such that x1 < X < x2

in a recent year had the normal distribution with mean =18.6 and

standard deviation = 5.9.

What is the probability that a randomly chosen student scores 21 or

higher?

Normal probability

distributions

The

probability distribution of many random variables is a normal

distribution. It shows what values the random variable can take and is

used to assign probabilities to those values.

Example: Probability

distribution of womens

heights.

Here since we chose a woman

randomly, her height, X, is a

random variable.

standardize the random variable (z score) and use Table A.

Normal distributions

Normal or Gaussian distributions are a family of symmetrical, bell

shaped density curves defined by a mean (mu) and a standard

deviation (sigma) : N().

f ( x)

1

2

1 x

x

e = 2.71828 The base of the natural logarithm

= pi = 3.14159

Here means are the same ( = 15)

while standard deviations are

different ( = 2, 4, and 6).

( = 10, 15, and 20) while

standard deviations are the same

( = 3)

Inflection point

mean = 64.5

Because all Normal distributions share the same properties, we can

standardize our data to transform any Normal curve N() into the

standard Normal curve N(0,1).

N(64.5, 2.5)

N(0,1)

=>

Standardizing: calculating zA

z-score measures the number of standard deviations that a data

scores

value x is from the mean .

(x )

z

than the mean, then z = 1.

for x , z

than the mean, then z = 2.

for x 2 , z

2 2

When x is smaller than the mean, z is negative.

N(, ) =

N(64.5, 2.5)

distribution. What percent of women are

Area= ???

mean = 64.5"

standard deviation = 2.5"

x (height) = 67"

Area = ???

= 64.5 x = 67

z=0

z=1

(x )

(67 64.5) 2.5

, z

2.5

2.5

Because of the 68-95-99.7 rule, we can conclude that the percent of women

shorter than 67 should be, approximately, .68 + half of (1 - .68) = .84 or 84%.

What is the probability, if we pick one woman at random, that her height will be

some value X? For instance, between 68 and 70 inches P(68 < X < 70)?

Because the woman is selected at random, X is a random variable.

(x )

z

N(, ) =

N(64.5, 2.5)

For x = 68",

(68 64.5)

1. 4

2.5

For x = 70",

(70 64.5)

2.2

2.5

0.9192

0.9861

The area under the curve for the interval [68" to 70"] is 0.9861 0.9192 = 0.0669.

Thus, the probability that a randomly chosen woman falls into this range is 6.69%.

P(68 < X < 70) = 6.69%

Using Table A

Table A gives the area under the standard Normal curve to the left of any z value.

.0082 is the

area under

N(0,1) left

of z = -2.40

under N(0,1) left

of z = -2.41

under N(0,1) left

of z = -2.46

()

For z = 1.00, the area under

the standard Normal curve

to the left of z is 0.8413.

N(, ) =

N(64.5, 2.5)

Area 0.84

Conclusion:

84.13% of women are shorter than 67.

Area 0.16

women are taller than 67".

= 64.5 x = 67

z=1

Because the Normal distribution

is symmetrical, there are 2 ways

Area = 0.9901

under the standard Normal curve

Area = 0.0099

z = -2.33

area right of z =

area left of z

To calculate the area between 2 z-values, first get the area under N(0,1)

to the left for each z-value from Table A.

Then subtract the

smaller area from the

larger area.

A common mistake made by

students is to subtract both zvalues, but the Normal curve is

not uniform.

area left of z1 area left of z2

(Try calculating the area to the left of z minus that same area!)

score at least 820 on the combined math and verbal SAT exam to compete in their

first college year. The SAT scores of 2003 were approximately normal with mean

1026 and standard deviation 209.

What proportion of all students would be NCAA qualifiers (SAT 820)?

x 820

1026

209

(x )

z

(820 1026)

z

209

206

z

0.99

209

Table A : area under

N(0,1) to the left of

z - .99 is 0.1611

or approx.16%.

=

=

total area

1

0.1611

84%

exactly 820 on the SAT. However, the proportion of scores

exactly equal to 820 is 0 for a normal distribution is a

consequence of the idealized smoothing of density curves.

The NCAA defines a partial qualifier eligible to practice and receive an athletic

scholarship, but not to compete, as a combined SAT score is at least 720.

What proportion of all students who take the SAT would be partial

qualifiers? That is, what proportion have scores between 720 and 820?

x 720

1026

209

(x )

z

(720 1026)

z

209

306

z

1.46

209

Table A : area under

N(0,1) to the left of

z - .99 is 0.0721

or approx. 7%.

area between

720 and 820

9%

=

=

0.1611

0.0721

between 720 and 820.

normally distributed data is that

we can manipulate it and then find

answers to questions that involve

comparing seemingly noncomparable distributions.

data. All this involves is changing

the scale so that the mean now = 0

and the standard deviation = 1. If

you do this to different distributions

it makes them comparable.

(x )

z

N(0,1)

Backward normal calculations: We may also want to find

the observed range of values that correspond to a given proportion under the

curve.

For that, we use Table A backward:

area/proportion in the

body of the table

corresponding z-value

from the left column and

top row

For an area to the left of 1.25 % (0.0125),

the z-value is -2.24

approximately the N(25.7, 5.88) distribution. How many miles per gallon

must a vehicle get to place in the top 10% of all 2001 model compact cars?

1. z = 1.28 is the standardized

value with area 0.9 to its left and

0.1 to its right.

2. Unstandardize

x 25.7

1.28

5.88

Solving for x gives x = 33.2

miles per gallon.

probability tables

0.2

0.0

0.1

density

0.3

0.4

-3

-2

-1

Z

P(Z > 1.87 )= 0.03

X 10

P X 11 P

11.025 10

0.3

P Z 1.87

1 P Z 1.87

1 - 0.9693

0.0307

0.3

One way to assess if a distribution is indeed approximately normal is to

plot the data on a normal quantile plot.

The data points are ranked and the percentile ranks are converted to zscores with Table A. The z-scores are then used for the x axis against

which the data are plotted on the y axis of the normal quantile plot.

If the distribution is indeed normal the plot will show a straight line,

indicating a good match between the data and a normal distribution.

distribution. Outliers appear as points that are far away from the overall

pattern of the plot.

the earnings of 15 black

female hourly workers at

National Bank. This

distribution is roughly

Normal except for one

low outlier.

the salaries of Cincinnati

Reds players on opening

day of the 2000 season.

This distribution is

skewed to the right.

As the number of randomly drawn

observations in a sample increases,

the mean of the sample

gets

mean .

This is the law of large numbers. It

is valid for any population.

but it is wrong. The law of large numbers only applies to really large numbers.

distribution?

The sampling distribution of a statistic is the distribution of all

possible values taken by the statistic when all possible samples of a

fixed size n are taken from the population. It is a theoretical idea we

do not actually build it.

of that statistic.

Sampling distribution of

We

take many random

samples of a given size n from a population

sample

mean

with mean and standard deviation

Some sample means will be above the population mean and some

will be below, making up the sampling distribution.

Sampling

distribution

of x bar

Histogram

of some

sample

averages

The mean of the sampling distribution is equal to the population

mean

is the sample size.

The

sample

mean

Mean of a sampling distribution of

x

below even if the distribution of the raw data is skewed. Thus, the mean

of the sampling distribution is an unbiased estimate of the population

mean it will be correct on average in many samples.

standard deviation of the population by a factor of n. Averages are

less variable than individual observations. Also, the results of large

samples are less variable than the results of small samples.

populations

When a variable in a population is normally distributed, the sampling

distribution of the sample mean for all possible samples of size n is

also normally distributed.

Sampling distribution

If the population is N( )

then the sample means

distribution is N( /n).

Population

Central Limit Theorem: When randomly sampling from any population

with mean and standard deviation , when n is large enough, the

sampling distribution of x bar is approximately normal: ~ N( /n).

Population with

strongly skewed

distribution

Sampling

distribution of

x for n = 2

observations

Sampling

distribution of

x for n = 10

observations

Sampling

distribution of

x for n = 25

observations

Histogram of 1000 sample means of 50-sized samples

Density

1.0

1.0

0.5

0.5

0.0

0.0

Density

1.5

1.5

2.0

2.5

Bin(5,0.7)

3.0

3.2

3.4

3.6

3.8

sample mean

random samples with n=50 and get their sample means

Relative frequency distribution is pproximately normal (bell shaped)

mean=3.50164 and sd=0.1471508

1.024695/ 50 0.1449138

In a large population of adults, the mean IQ is 112 with standard deviation 20.

Suppose 200 adults are randomly selected for a market research campaign.

The

B) Approximately normal, mean 112, standard deviation 20

C) Approximately normal, mean 112 , standard deviation 1.414

D) Approximately normal, mean 112, standard deviation 0.1

Application

Hypokalemia is diagnosed when blood potassium levels are low, below

3.5mEq/dl. Lets assume that we know a patient whose measured potassium

levels vary daily according to a normal distribution N( = 3.8, = 0.2).

If only one measurement is made, what is the probability that this patient will be

misdiagnosed hypokalemic?

( x ) 3.5 3.8

z

0.2

of such a misdiagnosis?

( x ) 3.5 3.8

z

n

0.2 4

Note: Make sure to standardize (z) using the standard deviation for the sampling

distribution.

Income distribution

Lets consider the very large database of individual incomes from the Bureau of

Labor Statistics as our population. It is strongly right skewed.

We take 1000 SRSs of 100 incomes, calculate the sample mean for

each, and make a histogram of these 1000 means.

We also take 1000 SRSs of 25 incomes, calculate the sample mean for

each, and make a histogram of these 1000 means.

Which histogram

corresponds to the

samples of size

100? 25?

It depends on the population distribution. More observations are

required if the population distribution is far from normal.

distribution from a strong skewness or even mild outliers.

skewness and outliers.

even for strange population distributions we can

assume a normal sampling distribution of the mean

and work with it to solve problems.

- Study Guide 01Diunggah olehldlewis
- ct32005-2010Diunggah olehMfundo Mshengu
- Cpk Index - How to Calculate for All Types of TolerancesDiunggah olehGaurav Narula
- An Introductory Guide to ShazamDiunggah olehYaronBaba
- MAT 540 Statistical Concepts for ResearchDiunggah olehnequwan79
- Descrete DistributionDiunggah olehajit
- Index NumberDiunggah olehMOHD.ARISH
- Measures of Central Tendency and VariabilityDiunggah olehnesuma
- Statistic Frequency DistributionDiunggah olehEllina James
- Serial 13_trial MRSM 2006(P1)Diunggah olehhisyamisa
- Measures of Central TendencyDiunggah olehEzekiel D. Rodriguez
- chapter6-exerciesextraDiunggah olehHassan Mosa
- 8. Probability Distributions.docDiunggah olehlengmiew
- Sampling DistributionDiunggah olehneeta
- Review of Probability and Statistics1Diunggah olehAbdkabeer Akande
- Normal Distribution.pptDiunggah olehPam Fajardo
- Taleb Violent ConflictsDiunggah olehlarga106
- the-normal-distribution.docxDiunggah olehPaul Dogaru
- Add Math Final YrDiunggah olehshanlby
- EsclerometriaDiunggah olehRuben Jorge Puma
- BAB 13 Statistical QC calcDiunggah olehDanny Pangesti Utomo
- $RRJ9JD9Diunggah olehSubramanian Rajesh
- 48 Standard DeviationDiunggah olehMj John Dell
- 126-2-495.pdfDiunggah olehFernando Cabezas Molina
- 1430Diunggah olehNauman Idrees
- STA220_TT1_2010F.pdfDiunggah olehexamkiller
- 1957_Antonowitz_An Analysis of the Java RatioDiunggah olehRobin Kusmanto
- Be Able to DefineDiunggah olehIrene Dimatulac
- coomunicaton.PDFDiunggah olehkrishna135
- RADAM_LS_MANUSCIRPT.docxDiunggah olehNads Memer

- Mechanics For Advanced Level PhysicsDiunggah olehHubbak Khan
- AP Stats Project 15Diunggah olehS.Waqquas
- ExercisesDiunggah olehS.Waqquas
- lfstat3e_ppt_07Diunggah olehS.Waqquas
- lfstat3e_ppt_08Diunggah olehS.Waqquas
- Lecture-2Diunggah olehS.Waqquas
- Lecture-1Diunggah olehS.Waqquas
- Lecture-5Diunggah olehS.Waqquas
- Probability, Sampling and DistributionsDiunggah olehS.Waqquas
- p1-p3Diunggah olehBoodish Radhakeesoon
- Revisionguide - StatsDiunggah olehS.Waqquas
- Aqa w Trb Pract PapersDiunggah olehSarahBukhsh
- 271649503 Edexcel Statistics 3Diunggah olehA4L
- LectureDiunggah olehS.Waqquas

- Maths EmDiunggah olehRaju Raju
- Chi-squared Distribution TableDiunggah olehPravallika Kollipara
- Pre Final Round Problems 2018 x3285s7f2Diunggah olehDaniel Sugihantoro
- Module 1.4 (MATHS) Final Log & Antilog.pdfDiunggah olehAbdul Qadeer Khan
- Indices and Logarithm_MCQDiunggah olehskywalker_handsome
- Exercise 01 Implicit Differentiation (Solutions)Diunggah olehECoUF
- What is LogrithemDiunggah olehqayxar
- MathDiunggah olehJyothi Nannuri
- LogarithmDiunggah olehKunalKaushik
- Logaritmos0001Diunggah olehLeonardo Torres
- Logarithms New June05Diunggah olehAnand Srivastava
- BetterExplained ExponentDiunggah olehBoon Bordoloi
- 08-09-13_HSC Maths Sol ASm SirDiunggah olehPrithviraj Netke
- USE OF LOG FOR 11THDiunggah olehdl9s6547
- CalculusDiunggah olehtbnaidu
- In DecesDiunggah olehUma Devi Nagiah
- Ampliacion ChuletaDiunggah olehAntonio Escamilla
- Calculo Geometria Analitica.jb.DecryptedDiunggah olehMarcelo Alejandro Lopez Zambrano
- eksponen.docxDiunggah olehDeny's Sydney
- HW LogDiunggah olehKhairi Ismail
- 7 Exponential Growth and DecayDiunggah oleh412137
- Exponential and Logarithmic Functions.pdfDiunggah olehKJ
- 5. Exponential and Logarithmic FunctionsDiunggah olehIsuru
- RationalDiunggah olehFlloyd Jardeleza
- log1Diunggah olehMaury Sp
- 01 ACM_ICPC - Elementary Math - IntroductionDiunggah olehMuhammadMahmoud
- Gilson Slide Rule Circular Manual of OperationDiunggah olehhughnile
- ExponentialDiunggah olehflorenciadelgado59
- Traffic Streams BehavioursDiunggah olehfr3chill
- solving exponential and logarithmic equationsDiunggah olehapi-268267969