ST Slides

^
Refers to measurements from a sample.
Sampling theory:
We look at sample characteristics to gain
understanding of the population at hand.
Recognize random variables and random
sample.
Important Statistics
n
Mean
Location
of a sample.
Variance
Spread
of a sample.
Sample mean
X1 + X 2 +L + X n
Xi
X=
=
n
i =1 n
n
Sample variance
2
(
X
X
)
S2 = i
n -1
i =1
n
n X - X i
i =1
S 2 = i =1
n( n - 1)
n
------(*)
2
2
i
or
Try deriving ** from *.
-------(**)
The usual assumption that a sample

comes from a normally distributed
2
N
(
m
,
s
).
population. Written as X~
Why?
We use sample statistics to estimate

population parameters.
X to estimate m
s 2 to estimate s 2
Since were now dealing with a number of

sample means, we would need to specify
a distribution for it.
n Theorem 5.1.1:
Let X1, X2, X3,, Xn be a random sample of
size n with Xi~ N ( m , s 2 ) . Then
n
s2
X ~ N (m ,
)
n
What happens if n increases?
Central Limit Theorem

Let X1, X2, X3,, Xn be a random sample from
a distribution with E(Xi)= m and var(Xi)= s 2 .
If n is sufficiently large (n 30 ), then
s2
X ~ N (m ,
)
n
The standard normal distribution of

Xn - m
Zn =
s n
~ N(0,1)
Normal approximation for X is good if

n 30 for any population.
For n<30, normal approximation for X is

only good for a population that is more or
less normally distributed.
n When n<30 and comes from a normal
distribution with s 2 unknown,
approximation is to be done using
t-distribution.
n
Definition 5.1.1
2
If X i ~ N ( m , s ) , then
T=
X -m
s/ n
has a t-distribution with

freedom.
n =n-1 degrees of
Cases
CLT
s known s unknown, s unknown,

n < 30
n 30
Approximation
Normal
Standardization Z = X - m
s/ n
Normal
Z=
X -m
s/ n
t-dist
T=
X -m
s/ n
Example 5.1.1:
A factory produces bulbs with lifetime
approximation to a normal distribution with
mean 600 hours and standard deviation
18 hours. Find the probability for average
lifetime less than 585 hours (if sample size
n = 9).
Example 5.1.2:
Suppose a manufacturer is interested in the
average production of a machine in a day, more
specifically, he is interested in the probability of
the machine producing on average more than
100 items per day. It is known that the machine
has a normal distribution with mean m and
2
variance s . The manufacturer measured the
production of 11 machines yielding the following
data:
115 82 98 126 109 143 136 92 103
127 150
Statistical Inference
n
Divided into
Estimation
Point estimator
n Interval estimator
n
Tests
of hypothesis
Point Estimator
n
A single numerical value from the sample

that is used to estimate an unknown
parameter.
X is the point estimate of m

s 2 is the point estimate of s 2
Interval Estimates:
Confidence Interval
An interval in which the true value of the
parameter falls with some level of
confidence.
n Example:
A 95% confidence interval for X means
that we are 95% confident that the value of
m lies in that interval.
n
Confidence Interval for m

n
Lets start off by building a confidence interval

for m when the population at hand is normally
distributed and s is known.
If x is the mean of a random sample of size n
2
s
from a population with known variance
,a
(1 - a )100% confidence interval for m is given by
x - za
2
s
n
< m < x + za
2
s
n
where z a is the z-value leaving an area of

2
the right.
a
2
to
Cases in building C.I. for
s known s unknown,
normal
population,
n 30
s unknown,
n < 30
Confidence
intervals
x za
2
s
n
x za
2
s
n
x ta
2
s
n
Example 5.2.1.2
Measurements of the weights of a random
sample of 200 containers made by a
certain machine showed a mean of 0.21
kilograms and it is known that s = 0.002
kilograms. Find the 95% confidence
interval for the mean weight of all the
containers.
Possible interpretations from a C.I.

n
What does the interval signify?
What is the margin of error?
What is the maximum error that can occur in

using the sample statistic to estimate the
population parameter?
Example
The average calcium contained in 36
samples taken from different locations is
found to be 1.6 grams per millilitre. Find
the 90% confidence interval for the mean
calcium contained in the river. Assume that
the samples standard deviation is 0.2.
What happens when the level of confidence

increases?
Example 5.2.1.5
A machine produces containers that are
cylindrical in shape. Nine containers are
randomly chosen and the diameters are
10.01, 9.97, 10.03, 10.04, 9.99, 9.98, 9.99,
10.01, and 10.03 centimetres. Find a 99%
confidence interval for the mean diameter
of containers from this machine, assuming
an approximate normal distribution.
The C.I.s built so far are two-sided C.I.s.

What if youre interested in building a one-sided
C.I.?
If x is the mean of a random sample of size n
from a population with known variance s 2, then
a (1 - a )100% one-sided confidence interval for m
is given by
s
x - za
<m<
n
More specifically, the one-sided c.i. given above
is bounded from below. Thus, x - za s is
n
known as the lower bound for m .
Confidence interval for p

In a large-sample (n 30), if p is the
proportion of successes in a random
sample of size n, and q = 1 - p , an
approximate (1 - a )100% confidence interval
for the binomial parameter p is given by
p - z a
2
pq
< p < p + z a
n
2
pq
n
where z a is the z-value leaving an area of

2
a to the right.
2
Example 5.2.1.6
In a random sample of n = 600 families
owning television sets in a city, it is found
that x = 240 subscribed to ASTRO. Find a
95% confidence interval for the actual
proportion of families in this city who
subscribe to ASTRO.
Confidence interval for s2

If s 2 is the variance of a random sample of
size n from a normal population, a (1 - a )100%
2
s
confidence interval for is given by
(n - 1)s 2
c 2a
2
2
(
n
1
)
s
< s2 <
c2 a
1-
where c and c1- a are c2-values with u = n -1

2
degrees of freedom, leaving areas of a
2
a
and 1 - , respectively, to the right.
2
a
2
Example 5.2.1.7
The following are the weights, in grams, of
10 packages of sugar packed by a worker:
454, 451, 458, 450, 451, 459, 458, 459,
452 and 450.
Find a 95% confidence interval for the
variance of all such packages of sugar
packed by this worker, assuming a normal
population.
Hypothesis Testing
n
A formal rule that tells us if a new

procedure is better than an existing one.
Testing involves steps:

Stating
the statistical hypothesis

Calculating the test statistic
Indicating the rejection region
Making the decision
Stating the statistical hypothesis

n
Statements regarding certain notions or

feelings that we might have about the
population parameters.
Divided into the null hypothesis ( H 0) and

alternative hypothesis ( H1 ).
Null
hypothesis, H0: A claim (or statement) about a

population parameter that is assumed to be true until
it is declared false.
Alternative hypothesis, H1: The hypothesis that we
will accept if we decide to reject the null hypothesis.
Example 5.2.2.1
Suppose a manufacturer observes that the
existing procedure gives about 4%
defective products. The engineer would
like to implement a new procedure to
reduce the number of defective products.
It was agreed that n=100 products would
be produced using the new procedure. Let
X equal the number of these 100 products
that are defective. State H0 and H1.
Calculating the test statistic

n
A value computed from sample data.
Suppose from the previous example, we

find that out of 100 products, only 3
products are defective under the new
procedure. Thus the test statistic is 3.
Indicating the rejection region

The values of the test statistic that will
imply rejection of the null hypothesis.
n In the previous example, we would like to
reject H0 and accept H1, so that the
number of defective products is reduced.
Since a sample of 100 is taken, it is
reasonable to reject H0 ,if X<4. If X 4 ,
then we accept H0.
n Thus, X<4 is the rejection region (critical
region).
n
Making the decision

H0 is rejected if the test statistic falls in the
rejection region.
n H0 is accepted if the test statistic falls in
the acceptance region.
n In the previous example, since the test
statistic falls in the rejection region, H0 is
rejected (statistical conclusion). What is
the layman conclusion that should follow?
n
Errors in hypothesis testing

We accept the new procedure as an
improvement when in fact, it is not.
n We reject the new procedure as an
improvement, when in fact, it is.
n Categorized as type I and type II errors
respectively.
n
P(type I error)
= P(reject H0 when H0 is true) = a.
P(type II error)
= P(accept H0 when H0 is false) = b.
n
Calculate the type I and type II errors for

the previous example.
Possible situations when testing

hypothesis
H0 true
H0 false
Accept H0
Correct
decision
type II error
Reject H0
type I error
Correct
decision
In hypothesis testing, the null and

alternative hypothesis can be stated in the
following manner:
TWO SIDED
ONE SIDED
(LEFT SIDE)
ONE SIDED
(RIGHT
SIDE)
Symbol in Ho
Symbol in H1
<
>
Rejection
region
Both tails
Left tail
Right tail
Examples on stating hypothesis:

A manufacturer of a certain brand of rice cereal
claims that the average saturated fat content
does not exceed 1.5 grams. State the null and
alternative hypothesis to be used in testing this
claim.
A real estate agent claims that 60% of all private
residences being built today are 3-bedroom
homes. State the null and alternative hypothesis
to test if there is a change in this claim.
Test of hypothesis for population mean, m.

(for normal population and s known)
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:
m=m0
m>m0
X - m0
z=
s
n
z>za
m=m0
m<m0
m=m0
mm0
X - m0
X - m0
z=
z=
s
s
n
n
z<-za
|z|>za/2
Reject H0 if test statistic falls in

critical region.

(for large sample and s unknown)
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:
m=m0
m>m0
X - m0
z=
s
n
z>za
m=m0
m<m0
m=m0
mm0
X - m0
X - m0
z=
z=
s
s
n
n
z<-za
|z|>za/2

critical region.

(for small sample (n<30) and s unknown)
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:
m=m0
m>m0
X - m0
t=
s
n
t>ta
m=m0
m<m0
m=m0
mm0
X - m0
X - m0
t=
t=
s
s
n
n
t<-ta
|t|>ta/2

critical region.

(for normal population and s known)
Hypothesis,
H0:
H1:
p-value:
Decision rule:
m=m0
m>m0
m=m0
m<m0
X - m0
X -m
P Z >
P Z < - s 0
s
m=m0
mm0
X - m0
2 P Z >
s
Reject H0 if p-value <a.

(for large sample and s unknown)
Hypothesis,
H0:
H1:
p-value:
Decision rule:
m=m0
m>m0
X - m0
P Z >
s
m=m0
m<m0
X - m0
P Z < s
m=m0
mm0

X - m0
2 P Z >
s

n

(for small sample (n<30) and s unknown)
Hypothesis,
H0:
H1:
p-value:
Decision rule:
m=m0
m>m0
X - m0
P T >
s
m=m0
m<m0
X - m0
P T < s
m=m0
mm0

X - m0
P
T
>
2

s

n

Example 5.2.2.6
A random sample of 100 electronic chips
showed an average lifetime of 2.8 years.
Assuming a population standard deviation
of 0.5 years, does this seem to indicate
that the mean lifetime is greater than 2.7
years? Use a 0.05 level of significance.
Run hypothesis testing using both the test
statistic approach and p-value approach.
Example
Suppose that 150 MMU students were
tested for their Intelligent Quotation (IQ).
From the data, the average IQ was 120
with a standard deviation of 11.3. An MMU
professor claims that he knows the overall
students IQ is different from 118. Using a
0.05 level of significance, determine if the
professor is correct by using both the test
statistics as well as the p-value approach.
Example 5.2.2.9
Test the hypothesis that the average
diameter of a certain type of battery
produced by a factory is 10 millimetres if
the diameters of a random sample of 10
batteries are 10.1, 9.8, 10.1, 10.5, 10.1,
9.7, 9.9, 10.4, 10.3 and 9.8 millimetres.
Use a 0.01 level of significance and
assume that the distribution of diameters
is normal. Run hypothesis testing using
both the test statistics and p-value
approach.
Test of hypothesis for population proportion, p

using test statistics approach (for a large
sample).
Hypothesis,
p=p0
H0:
p=p0
p=p0
p<p0
p>p0
H1:
pp0
Test statistic:
Critical region:
Decision rule:
p - p 0
z=
p0q0
n
z>za
, p0=null proportion value,

p =sample proportion
z<-za
|z|>za/2

critical region.
Test of hypothesis for population proportion, p

using p-value approach. (for a large sample).
Hypothesis,
H0:
H1:
p-value:
Decision rule:
p=p0
p>p0
p - p0
P z >
p 0 q0
p=p0
p<p0

P z < - p - p0

p0 q 0

n

p=p0
pp0
2 P z > p - p0
p 0 q0
Example 5.2.2.10:
A common medicine for relieving serious
pain is believed to be only 80% effective. A
new medicine is used to a random sample
of 100 adults who were suffering from
serious pain and it shows that 85 received
relief. Is this sufficient evidence to
conclude that the new medicine is superior
to the one commonly prescribed? Use a
0.05 level of significance.
Test of hypothesis for population variance, s2

using test statistics approach.
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:
s2=s02
s2>s02
s2=s02
s2<s02
s2=s02
s2s02
2
(
n
1
)
s
2
, s2=sample variance,
c =
s02=null variance value
s 02
c2>c2a
c2<c21-a c2<c21-a/2
or
c2>c2a/2
critical region.
Example 5.2.2.12
In paper manufacturing, a process is
considered out of control if the standard
deviation of the weight of a piece of paper
exceeds 1.25 grams. A random sample of
20 pieces of papers produced during a
routine check yield a standard deviation of
1.9 grams. At the 0.05 level of
significance, is the paper production
process out of control?
Chi-squared Goodness of Fit test (GOF)
Used to test if a model fits a given

scenario well.
In a given scenario, frequencies are
observed and compared to frequencies
expected by fitting a said model.
Procedures for GOF:

Hypothesis,
H0:
H1:
Test statistic:
Decision rule:
A said model fits a given scenario

well.
A said model does not fit a given
scenario well.
2
(
o
e
)
2
c = i i
ei
i =1
k
,oi =observed frequencies,

ei =expected frequencies,
k =number of classes
Reject H0 if c2>c2a(k-1)
Example 5.3.1.
Tossing a fair dice 180 times yields 26
ones, 32 twos, 25 threes, 24 fours, 35
fives and 38 sixes. Test at the 0.01 level of
significance whether the data obtained
from the experiment has a discrete
uniform distribution.
Example 5.3.2.
Suppose the lifetime (in hours), X for 40 bulbs is
recorded as follows:
Class boundaries (hours)
Observed frequencies
1.5 2.0
2.0 2.5
2.5 3.0
11
3.0 3.5
15
3.5 4.0
4.0 4.5
Test at a 0.01 level of significance if the lifetimes

of the bulbs may be approximated using a normal
distribution with m=3.2 and s=0.5.

ST Slides

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

ST Slides

Diunggah oleh

Hak Cipta:

Format Tersedia

^

Refers to measurements from a sample.

Try deriving ** from *.

The usual assumption that a sample

We use sample statistics to estimate

Since were now dealing with a number of

What happens if n increases?

Central Limit Theorem

The standard normal distribution of

Normal approximation for X is good if

For n<30, normal approximation for X is

has a t-distribution with

s known s unknown, s unknown,

A single numerical value from the sample

X is the point estimate of m

Confidence Interval for m

Lets start off by building a confidence interval

where z a is the z-value leaving an area of

Cases in building C.I. for

Possible interpretations from a C.I.

What does the interval signify?

What is the margin of error?

What is the maximum error that can occur in

What happens when the level of confidence

The C.I.s built so far are two-sided C.I.s.

Confidence interval for p

where z a is the z-value leaving an area of

Confidence interval for s2

where c and c1- a are c2-values with u = n -1

A formal rule that tells us if a new

Testing involves steps:

the statistical hypothesis

Stating the statistical hypothesis

Statements regarding certain notions or

Divided into the null hypothesis ( H 0) and

hypothesis, H0: A claim (or statement) about a

Calculating the test statistic

A value computed from sample data.

Suppose from the previous example, we

Indicating the rejection region

Making the decision

Errors in hypothesis testing

Calculate the type I and type II errors for

Possible situations when testing

In hypothesis testing, the null and

Examples on stating hypothesis:

Test of hypothesis for population mean, m.

Reject H0 if test statistic falls in

Test of hypothesis for population mean, m.

Reject H0 if test statistic falls in

Test of hypothesis for population mean, m.

Reject H0 if test statistic falls in

Test of hypothesis for population mean, m.

Reject H0 if p-value <a.

Test of hypothesis for population mean, m.

Reject H0 if p-value <a.

Test of hypothesis for population mean, m.

Reject H0 if p-value <a.

Test of hypothesis for population proportion, p

, p0=null proportion value,

Reject H0 if test statistic falls in

Test of hypothesis for population proportion, p

Reject H0 if p-value <a.