Statistics Concept Review for Data Analysis

A bit concept review...
Statistics is ...
Population is ...
Sample is ...
Variable is ...
Parameter is ...
Relation between statistics & parameters ...
Difference between statistics and probability
Statistics assumptions :
Population and its parameters are (unknown / known)
Sample is used to ..... parameters
A bit concept review... (3)

Statistik Ilmu pengetahuan yang berhubungan
dengan data statistik dan fakta yang benar.
Statistik Suatu kajian ilmu pengetahuan yang
menggunakan teknik pengumpulan data, teknik
pengolahan data, teknik analisis data, penarikan
kesimpulan, dan pembuatan
kebijakan/keputusan yang cukup kuat alasannya
berdasarkan data dan fakta yang benar.

Populasi sejenis
sampel
1. Populasi Terbatas
2. Populasi tak terbatas
a. Populasi homogen
b. Populasi heterogen

Teknik sampling
Probability sampling
Simple random sampling
Proportionate stratified random sampling
Disproportionate stratified random sampling
Area sampling
Non-probability sampling
Sampling sistematis
Sampling kuota
Sampling aksidental
Purposive sampling
Sampling jenuh
Snowball sampling

DATA?
1.Data kualitatif
2.Data kuantitatif

SKALA PENGUKURAN?
1.Skala Nominal
2.Skala Ordinal
3.Skala Interval
4.Skala Ratio
Hypothesis Testing
Niniet Indah Arvitrida, ST, MT

niniet@ie.its.ac.id
Sepuluh Nopember Institute of Technology
INDONESIA
2013
Course 1 Objectives
Students able to
Formulate null and alternative hypotheses

for applications involving a single population
mean or proportion
Formulate a decision rule for testing a
hypothesis
Know how to use the test statistic, critical value,

and p-value approaches to test the null hypothesis
Know what
Compute the probability of a Type II error
Type I and Type II errors are
Review of Estimating
Population Value
Point and Interval

Estimates
A point estimate is a single number,
a confidence interval provides
additional information about variability
Lower
Confidence
Limit
Point Estimate
Width of
confidence interval
Upper
Confidence
Limit
Point Estimates
We can estimate a
Population Parameter
with a Sample
Statistic
(a Point Estimate)
Mean
Proportion
Confidence Intervals
How much uncertainty is associated with a
point estimate of a population parameter?
An interval estimate provides more
information about a population
characteristic than does a point estimate
Such interval estimates are called
confidence intervals
Estimation Process
Random Sample
Population
(mean, , is
unknown)
Sample
Mean
x = 50
I am 95%
confident that
is between
40 & 60.
Confidence Level
Confidence Level
Confidence in which the
interval will contain the
unknown population
parameter
A percentage (less than 100%)
Confidence Interval for

( Known)
Assumptions
Population standard deviation is known
Population is normally distributed
If population is not normal, use large sample
Confidence interval estimate
x z /2
Finding the Critical Value

z /2 1.96
Consider a 95% confidence interval:
1 .95
.025
2
.025
2
z units:
z.025= -1.96
x units:
Lower
Confidence
Limit
0
Point Estimate
z.025= 1.96
Upper
Confidence
Limit
Margin of Error
Margin of Error (e): the amount added
and subtracted to the point estimate to
form the confidence interval
Example: Margin of error for estimating , known:
x z /2
e z /2
Factors Affecting Margin of

Error
e z /2
Data variation, :
Sample size, n :
Level of confidence, 1 - :
Chap 7-20
as
as n
e
if 1 -
Example
A sample of 11 circuits from a large
normal population has a mean
resistance of 2.20 ohms. We know
from past testing that the population
standard deviation is .35 ohms.
Determine a 95% confidence interval
for the true mean resistance of the
population.
Chap 7-21
Example
(continue
d)
A sample of 11 circuits from a large

normal population has a mean resistance
of 2.20 ohms. We know from past testing
that the population standard deviation is .
35 ohms.
x z /2
Solution:
n
2.20 1.96 (.35/ 11)
2.20 .2068
1.9932 .......... ..... 2.4068
Chap 7-22
Interpretation
We are 95% confident that the true mean
resistance is between 1.9932 and 2.4068
ohms
Although the true mean may or may not
be in this interval, 95% of intervals formed
in this manner will contain the true mean
An incorrect interpretation is that there is 95% probability that this
interval contains the true population mean.
(This interval either does or does not contain the true mean, there is
no probability for a single interval)
Chap 7-23
Students t Distribution
Note: t
z as n increases
Standard
Normal
(t with df = )
t (df = 13)
t-distributions are bellshaped and symmetric, but
have fatter tails than the
normal
t (df = 5)
0
Chap 7-24
Approximation for Large

Samples
Since t approaches z as the sample
size increases, an approximation is
sometimes used when n 30:
Technically
correct
x t /2
Chap 7-25
s
n
Approximation
for large n
x z /2
s
n
Now we talk about

HYPOTHESIS
What is a Hypothesis?
A hypothesis is a claim
(assumption) about a
population parameter:
population mean
Example: The mean monthly cell phone bill

of this city is = $42
population proportion
Example: The proportion of adults in this

city with cell phones is p = .68
The Null Hypothesis, H0

States the assumption (numerical) to
be tested
Example: The average number of TV sets
H0 : three
3 (
in U.S. Homes is at least
)
Is always about a population

parameter,
not about a sample
H0 : x 3
H
:
3
0
statistic
The Null Hypothesis, H0
(continued)
Begin with the assumption that the
null hypothesis is true

Similar to the notion of innocent until
proven guilty
Always contains
sign
May
= , or
or may not be rejected
The Alternative
Hypothesis, HA
Is the opposite of the null hypothesis
e.g.: The average number of TV sets in
U.S. homes is less than 3 ( HA: < 3 )
Never contains the = , or
sign
May or may not be accepted
Is generally the hypothesis that is
believed (or needs to be supported)
by the researcher
Hypothesis Testing Process

Claim: the
population
mean age is 50.
(Null Hypothesis:
H0: = 50 )
Population
Is x 20 likely if = 50?
If not likely,
REJECT
Null Hypothesis
Suppose
the sample
mean age
is 20: x = 20
Now select a
random sample
Sample
Reason for Rejecting H0

Sampling Distribution of x
20
If it is unlikely that
we would get a
sample mean of
this value ...
= 50
If H0 is true
... if in fact this were

the population mean
... then we
reject the null
hypothesis that
= 50.
Level of Significance,
Defines unlikely values of sample
statistic if null hypothesis is true
Defines rejection region of the sampling
distribution
Is designated by , (level of significance)

Typical values are .01, .05, or .10
Is selected by the researcher at the

beginning
Provides the critical value(s) of the test
Level of Significance
and the Rejection Region
Level of significance =
H0: 3
HA: < 3
H0: 3
HA: > 3
H0: = 3
HA: 3
Represents
critical value
1-
Rejection
region is
shaded
Lower tail test
1-
Upper tail test
/2
Two tailed test
1-
0
/2
Errors in Making Decisions

Type I Error
Reject a true null hypothesis
Considered a serious type of error
Theprobability
probability of Type
I Error
is is
The
of Type
I Error
Called level of significance of the

test
Set by researcher in advance
Errors in Making Decisions

(continued)
Type II Error
Fail to reject a false null hypothesis
The probability of Type II Error is
Outcomes and Probabilities

Possible Hypothesis Test Outcomes
State of Nature
Key:
Outcome
(Probability)
Decision
H0 True
H0 False
Do Not
Reject
H0
No error
(1 - )
Type II Error
()
Reject
H0
Type I Error
()
No Error
(1-)
Type I & II Error Relationship

Type I and Type II errors can not happen at
the same time
Type I error can only occur if H0 is true
Type II error can only occur if H0 is false

If Type I error probability ( )
Type II error probability ( )
, then
Factors Affecting Type II

Error
All else equal,

when the difference between
hypothesized parameter and its true
value

when
when
when
Critical Value
Approach to Testing
Convert sample statistic (e.g.:
x
statistic ( Z or t statistic )
) to test
Determine the critical value(s) for a specified

level of significance from a table or
computer
If the test statistic falls in the rejection
region, reject H0 ; otherwise do not reject
H0
Lower Tail Tests
H0: 3
The cutoff value,

-z or x , is called a
critical value
HA: < 3
Reject H0
-z
x
x z
Do not reject H0
Upper Tail Tests
The cutoff value,
z or x , is called a
critical value
H0: 3
HA: > 3
Do not reject H0
x z
Reject H0
Two Tailed Tests
H0: = 3
HA:
3
There are two cutoff

values (critical values):
z/2
or
x/2
x/2
Lower
Upper
/2
Reject H0
/2
Do not reject H0
-z/2
x/2
Lower
x /2 z /2
Reject H0
z/2
x/2
Upper
Critical Value
Approach to Testing
Convert sample statisticx(
( Z or t statistic )
) to a test statistic
Hypothesis
Tests for
Known
Unknown
Large
Samples
Small
Samples
z
n
Calculating the Test Statistic

Hypothesis
Tests for
Known
Unknown
The test statistic is:
Large
Samples
Small
Samples
tn1snzxn
(continued)
Hypothesis
Tests for
Known
But is sometimes
approximated
using a z:
Unknown
Large
Samples
Small
Samples
tn1sn
(continued)
Hypothesis
Tests for
Known
Unknown
Large
Samples
(The population must be

approximately normal)
Small
Samples
Review: Steps in Hypothesis

Testing
1. Specify the population value of interest
2. Formulate the appropriate null and
alternative hypotheses
3. Specify the desired level of significance
4. Determine the rejection region
5. Obtain sample evidence and compute
the
test statistic
6. Reach a decision and interpret the result
Hypothesis Testing
Example
Test the claim that the true mean # of TV
sets in US homes is at least 3.
(Assume = 0.8)
1. Specify the population value of interest

The mean number of TVs in US homes
2. Formulate the appropriate null and alternative
hypotheses
H : 3
HA: < 3 (This is a lower tail test)
0
3. Specify the desired level of significance
Suppose that = .05 is chosen for this test
Hypothesis Testing Example

(continued)
4. Determine the rejection region

= .05
Reject H0
-z= -1.645
Do not reject H0
This is a one-tailed test with = .05.

Since is known, the cutoff value is a z value:
Reject H0 if z < z = -1.645 ; otherwise do not reject H0
.z
x
2
8
4
.
1
6
2
.
0
0
0
8
n1

5. Obtain sample evidence and compute the
test statistic
Suppose a sample is taken with the following results:

n = 100, x = 2.84 ( = 0.8 is assumed known)
Then the test statistic is:

(continued)
6. Reach a decision and interpret the result

= .05
z
Reject H0
-1.645
-2.0
Do not reject H0
Since z = -2.0 < -1.645, we reject the null

hypothesis that the mean number of TVs in US
homes is at least 3

(continued)
An alternate way of constructing rejection region:

Now
expressed
in x, not z
units
= .05
x
Reject H0
2.8684
2.84
Do not reject H0
0.8
x z
3 1.645
2.8684
n
100
Since x = 2.84 < 2.8684,
we reject the null

hypothesis
p-Value Approach to
Testing
x (e.g. ) to
Convert Sample Statistic
Test Statistic ( Z or t statistic )
Obtain the p-value from a table or

computer
Compare the p-value with
If p-value < , reject H0
If p-value , do not reject H0
p-Value Approach to Testing

(continued)
p-value: Probability of obtaining a test

statistic more extreme ( or ) than
the observed sample value given H0 is
true
Also called observed level of significance
Smallest value of for which H0 can be
rejected
p-value example
Example: How likely is it to see a
sample mean of 2.84 (or something
further below the mean) if the true
mean is = 3.0? = .05
P( x 2.84 | 3.0)
2.84 3.0
P z
0.8
100
P(z 2.0) .0228
p-value =.0228
x
2.8684
2.84
p-value example
(continued)
Compare the p-value with

If p-value < , reject H0
If p-value
, do not reject H0
= .05
Here: p-value = .0228

= .05
Since .0228 < .05, we reject
the null hypothesis
p-value =.0228
2.8684
2.84
Example: Upper Tail z Test

for Mean ( Known)
A phone industry manager thinks that
customer monthly cell phone bill have
increased, and now average over $52
per month. The company wishes to test
this claim. (Assume = 10 is known)
Form hypothesis test:
H0: 52 the average is not over $52 per month
HA: > 52
the average is greater than $52 per month

(i.e., sufficient evidence exists to support the
managers claim)
Example: Find Rejection

Region
(continued)
Suppose that = .10 is chosen for this test

Find the rejection region:
Reject H0
= .10
Do not reject H0
z=1.28
Reject H0
Reject H0 if z > 1.28
Review:
Finding Critical Value - One
Tail
What is z given =
0.10?
.90
.10
= .10
.50 .40
Standard Normal
Distribution Table (Portion)
0 1.28
Critical Value
= 1.28
.07
.08
.09
1.1 .3790 .3810 .3830

1.2 .3980 .3997 .4015
1.3 .4147 .4162 .4177
5
3
.
1
5
2
z
0
.
8
0
n64
Example: Test Statistic
(continued)
Obtain sample evidence and compute

the test statistic
Suppose a sample is taken with the
following results:
n = 64, x = 53.1
(=10 was assumed known)
Then the test statistic is:
Example: Decision
(continued)
Reach a decision and interpret the result:

Reject H0
= .10
Do not reject H0
1.28
0
z = .88
Reject H0
Do not reject H0 since z = 0.88 1.28

i.e.: there is not sufficient evidence that the
mean bill is over $52
p -Value Solution
(continued)
Calculate the p-value and compare to

p-value = .1894
Reject H0
= .10
0
Do not reject H0
1.28
z = .88
Reject H0
P( x 53.1 | 52.0)
53.1 52.0
P z
10
64
P(z 0.88) .5 .3106

.1894
Do not reject H0 since p-value = .1894 > = .10
Example: Two-Tail Test

( Unknown)
The average cost of a
hotel room in New York
is said to be $168 per
night. A random
sample of 25 hotels
resulted in
x = $172.50 and
s = $15.40. Test at the
= 0.05 level.
(Assume the population distribution is normal)
H0: = 168
HA:
168
1
7
2
.
5
0
1
6
8
tn1
1
.
4
6
s
4
n
Example Solution: Two-Tail

Test
H0: = 168
HA:
168
= 0.05
n = 25
is unknown,
so
use a t statistic
Critical Value:
t24 = 2.0639
/2=.025
Reject H0
-t/2
-2.0639
/2=.025
Do not reject H0
1.46
Reject H0
t/2
2.0639
Do not reject H0: not sufficient evidence that

true mean cost is different than $168
Hypothesis Tests for

Proportions
Involves categorical values
Two possible outcomes
Success (possesses a certain characteristic)
Failure (does not possesses that
characteristic)
Fraction or proportion of population in the

success category is denoted by p
Proportions
(continued)
Sample proportion in the success

category is denoted by p
x
number of successes in sample
p
n
sample size
When both np and n(1-p) are at least

5, p can be approximated by a normal
distribution with mean and standard
p(1 p)
deviation
p

Hypothesis Tests for

Proportions
The sampling
distribution of p
is normal, so the
test statistic is a
z value:
pp
p(1 p)
n
Hypothesis
Tests for p
np 5
and
n(1-p) 5
np < 5
or
n(1-p) < 5
Not discussed
in this chapter
Example: z Test for

Proportion
A marketing company
claims that it receives
8% responses from its
mailing. To test this
claim, a random
sample of 500 were
surveyed with 25
responses. Test at
the = .05
significance level.
Check:
n p = (500)(.08) = 40
n(1-p) = (500)(.92) = 460
Z Test for Proportion:

Solution
Test Statistic:
H0: p = .08
HA: p .
08= .05
n = 500, p = .05
Reject
.025
.025
-1.96
-2.47
1.96
.05 .08
2.47
.08(1 .08)
500
Decision:
Critical Values: 1.96

Reject
pp
p(1 p)
n
Reject H0 at = .05
Conclusion:
There is sufficient
evidence to reject the
companys claim of 8%
response rate.
p -Value Solution
(continued)
Calculate the p-value and compare to

(For a two sided test the p-value is always two sided)
Do not reject H0
Reject H0
/2 = .025
Reject H0
/2 = .025
.0068
.0068
-1.96
z = -2.47
p-value = .0136:
P(z 2.47) P(x 2.47)

2(.5 .4932)
2(.0068) 0.0136
1.96
z = 2.47
Reject H0 since p-value = .0136 < = .05
Type II Error
Type II error is the probability of
failing to reject a false H0
Suppose we fail to reject H0: 52

when in fact the true mean is = 50
50
52
Reject
H0: 52
Do not reject
H0 : 52
Type II Error
(continued)
Suppose we do not reject H0: 52

This is the range of x where
H0 is not rejected
This is the true

distribution of x if = 50
50
Reject
H0: 52
52
Do not reject
H0 : 52
Type II Error
(continued)
Suppose we do not reject H0: 52

Here, = P( x cutoff ) if = 50
50
Reject
H0: 52
52
Do not reject
H0 : 52
Calculating
Suppose n = 64 , = 6 , and = .05
6
52 1.645
50.766
n
64
cutoff x z
(for H0 : 52)
So = P( x 50.766 ) if =
50
50
50.766
Reject
H0: 52
52
Do not reject
H0 : 52
Calculating
(continued)
Suppose n = 64 , = 6 , and = .05
50.766 50
P( x 50.766 | 50) P z
P(z 1.02) .5 .3461 .1539
64
Probability of
type II error:
= .1539
50
Reject
H0: 52
52
Do not reject
H0 : 52
Chapter Summary
Addressed hypothesis testing
methodology
Performed z Test for the mean (
known)
Discussed pvalue approach to
hypothesis testing
Performed one-tail and two-tail tests . . .
Chapter Summary
(continued)
Performed t test for the mean (

unknown)
Performed z test for the proportion
Discussed type II error and
computed its probability

Statistics Concept Review for Data Analysis

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics Concept Review for Data Analysis

Diunggah oleh

Hak Cipta:

Format Tersedia

A bit concept review...

A bit concept review... (3)

A bit concept review... (4)

A bit concept review... (5)

A bit concept review... (6)

A bit concept review... (7)

A bit concept review... (8)

Niniet Indah Arvitrida, ST, MT

Formulate null and alternative hypotheses

Formulate a decision rule for testing a

Know how to use the test statistic, critical value,

Compute the probability of a Type II error

Type I and Type II errors are

Point and Interval

Confidence Interval for

Confidence interval estimate

Finding the Critical Value

Factors Affecting Margin of

A sample of 11 circuits from a large

Approximation for Large

Now we talk about

Example: The mean monthly cell phone bill

Example: The proportion of adults in this

The Null Hypothesis, H0

Is always about a population

The Null Hypothesis, H0

Begin with the assumption that the

null hypothesis is true

or may not be rejected

Never contains the = , or

Hypothesis Testing Process

Reason for Rejecting H0

... if in fact this were

Is designated by , (level of significance)

Is selected by the researcher at the

Lower tail test

Upper tail test

Errors in Making Decisions

Called level of significance of the

Errors in Making Decisions

Outcomes and Probabilities

Type I & II Error Relationship

Type I error can only occur if H0 is true

Type II error can only occur if H0 is false

Factors Affecting Type II

Determine the critical value(s) for a specified

Lower Tail Tests

The cutoff value,

Upper Tail Tests

The cutoff value,

Two Tailed Tests

There are two cutoff

Calculating the Test Statistic

The test statistic is:

Calculating the Test Statistic

The test statistic is:

Calculating the Test Statistic

The test statistic is:

(The population must be

Review: Steps in Hypothesis

1. Specify the population value of interest

Hypothesis Testing Example