Anda di halaman 1dari 30

|  

 


³to be or not to be
Normal´
TOPICS
‡ Normal Distributions
‡ Skewness & Kurtosis
‡ Normal Curves and Probability
‡ Z- scores
‡ Confidence Intervals
‡ Hypothesis Testing
‡ The t-distribution
Is this normal ?
3.5

3.0

2.5

2.0

1.5

1.0

.5 St .  = 160.68

M a n = 178.3
0.0  = 6.00
100.0 200.0 300.0 400.0 500.0

VAR00001



Cumulati e
Fr q ncy Percent Vali Percent Percent
Vali 70.00 1 16.7 16.7 16.7
100.00 2 33.3 33.3 50.0
150.00 2 33.3 33.3 83.3
500.00 1 16.7 16.7 100.0
Total 6 100.0 100.0

j   

VAR00001

 Vali 6

Missin 0
Man 178.3333

Sk nss 2.242

St . Error o
Sk nss .845

rtosis 5.219

St . Error o
rtosis 1.741
R 

 
‡ Are your curves normal?
‡ Why do we care about normal curves?
‡ What do normal curves tell us?
Answer:
The curves tell us something about the distribution
of the population
The curves allow us to make statistical inferences
regarding the probability of some outcomes
within some margin of error
    


‡ A distribution is easily
depicted in a graph
where the height of the
line determined by the
frequency of cases for
the values beneath it
‡ Most cases cluster
near the middle of a
distribution if close to
normal
 R 
‡ Bell-shaped distribution or curve
‡ Perfectly symmetrical about the mean
Mean median mode
‡ Tails are p : closer and closer to
horizontal axis but never reach it
Skewness and Sample Distributions
R
       
  
Skewness
‡ Formula for skewness

  ⠝
j
j
°

 
   
‡ Beyond skewness, kurtosis tells us when
our distribution may have high or low
variance, even if normal

‡ The kurtosis value for a normal distribution


will equal Anything above this is a
peaked value (low variance) and anything
below is platykurtic (high variance)
Back to normal distributions
‡ The power of normal distributions, or those
close to it, is that we can predict where
cases will fall within a distribution
probabilistically

‡ For example, what are the odds, given the


population parameter of human height, that
someone will grow to more than eight feet?
‡ Answer, likely less than a probability
Sample Distribution
‡ What does Andre the
Giant do to the sample
distribution?

‡ What is the probability


of finding someone
like Andre in the
population?
‡ Are you ready for
more inferential
statistics?
‡ Answer: Oh boy, yes!!
R    

‡ We have answered the question of what
Andre and the Sumo wrestler would do to
the distribution
‡ But what about the probability of finding
someone the same height as Andre in the
population?
‡ What is the probability of finding someone
the same height as Dr Peña or Dr
Boehmer?
More on normal curves and
probability

Dr Boehmer would be here Andre would be here


    
‡ We can standardize the central tendency
away from the mean across different
samples with z-scores

‡ The basic unit of the z-score is the standard


deviation


 â



Ò   
   
  
 
  
  
 
  !

"#  


 
 
  # 
  $%&

Answer: standard deviations

'
#
 
  
   
 
  
$%&

Answer: %
   (
‡ Ever hear a poll report a margin of error? What
is that?
andom Sampling Error standard deviation/ square
root of the sample size
Or

Ò R 
   
 
 
    
 
   


   
  

 
 
 

Standard Error
‡ We often refer to both the mp 
p 

mmm with both the chance to err when


sampling but also the error of a specific
sample statistic, the mean We typically
use the term Standard Error

‡ A sample statistic standard error is the


difference between the mean of a sample
and the mean of the population from which
it is drawn
Standard Error
Example: What if most humans were
pounds and only million globally were
pounds?

The random sampling error would be low


since the chance of collecting a sample
consisting heavily of those heavier humans
would be unlikely There would not be
much error in general from sampling
because of the low variance
Standard Error
‡ Example continued Now, when we take a
sample, each sample has a mean If a
population has low variance, so should the
samples We should see this reflected in
low standard error in the mean of the
sample, the sample statistic

‡ Of course, higher variance in the


population also causes higher error in
samples taken from it
Some more notation


  |  
   !

Sample of
observed data
s
Population ȝ ı
epeated ȝ
Ò R
Sampling

   (

(    


 
  ( 

 )
   

vemember that if we took an infinite number


of samples from a population, the means
of these samples would be normally
distributed

Hence, the larger the sample relative to the


population, the more likely the sample
mean will capture the population mean
   
 
‡ We can actually use the information we
have about a standard deviation from the
mean and calculate the range of values for
which a sample would have if they were to
fall close to the mean of the population

‡ This range is based on the probability that


the sample mean falls close to the
population mean with a probability of ,
or % error
"# 
 *&
‡ Are you % sure?
‡ Social scientists use a % as a threshold
to test whether or not the results are
product of chance
‡ That is, we take out of chances to be
wrong
‡ What do you MEAN?
We build a % confidence interval to make
sure that the mean will be within that
range
   
 
â â m mean

 ÈÒ Z
ı
Z score related with a
standard error
% CI

    › ȝ 


V  
‡ Assume the following

Ò

â
Ò
R
Ò  â 
Ò ù ù s
R ' '
CI

      s 
 ù    's
› ù   
Why do we use ?
Calculating a % CI
Let¶s look at the class population
distribution of height
Is it a normal or skew distribution?
Let¶s build a % CI around the mean
height of the class
Why do we care about CI?
‡ We use CI interval for hypothesis testing
‡ For instance, we want to know if there is
an income difference between El Paso
and Boston
‡ We want to know whether or not taking
class at Kaplan makes a difference in our
GvE scores
Mean Difference testing
Mean USA

El Paso Las Cruces Boston

Income levels