Anda di halaman 1dari 58

^

Refers to measurements from a sample.

Sampling theory:
We look at sample characteristics to gain
understanding of the population at hand.
Recognize random variables and random
sample.

Important Statistics
n

Mean
Location

of a sample.

Variance
Spread

of a sample.

Sample mean
X1 + X 2 +L + X n
Xi
X=
=
n
i =1 n
n

Sample variance
2
(
X
X
)
S2 = i
n -1
i =1
n

n X - X i
i =1
S 2 = i =1
n( n - 1)
n

------(*)
2

2
i

or

Try deriving ** from *.

-------(**)

The usual assumption that a sample


comes from a normally distributed
2
N
(
m
,
s
).
population. Written as X~
Why?

We use sample statistics to estimate


population parameters.
X to estimate m
s 2 to estimate s 2

Since were now dealing with a number of


sample means, we would need to specify
a distribution for it.
n Theorem 5.1.1:
Let X1, X2, X3,, Xn be a random sample of
size n with Xi~ N ( m , s 2 ) . Then
n

s2
X ~ N (m ,
)
n

What happens if n increases?

Central Limit Theorem


Let X1, X2, X3,, Xn be a random sample from
a distribution with E(Xi)= m and var(Xi)= s 2 .
If n is sufficiently large (n 30 ), then
s2
X ~ N (m ,
)
n

The standard normal distribution of


Xn - m
Zn =
s n

~ N(0,1)

Normal approximation for X is good if


n 30 for any population.

For n<30, normal approximation for X is


only good for a population that is more or
less normally distributed.
n When n<30 and comes from a normal
distribution with s 2 unknown,
approximation is to be done using
t-distribution.
n

Definition 5.1.1
2
If X i ~ N ( m , s ) , then

T=

X -m
s/ n

has a t-distribution with


freedom.

n =n-1 degrees of

Cases
CLT

s known s unknown, s unknown,


n < 30
n 30
Approximation

Normal

Standardization Z = X - m
s/ n

Normal
Z=

X -m
s/ n

t-dist
T=

X -m
s/ n

Example 5.1.1:
A factory produces bulbs with lifetime
approximation to a normal distribution with
mean 600 hours and standard deviation
18 hours. Find the probability for average
lifetime less than 585 hours (if sample size
n = 9).

Example 5.1.2:
Suppose a manufacturer is interested in the
average production of a machine in a day, more
specifically, he is interested in the probability of
the machine producing on average more than
100 items per day. It is known that the machine
has a normal distribution with mean m and
2
variance s . The manufacturer measured the
production of 11 machines yielding the following
data:
115 82 98 126 109 143 136 92 103
127 150

Statistical Inference
n

Divided into
Estimation

Point estimator
n Interval estimator
n

Tests

of hypothesis

Point Estimator
n

A single numerical value from the sample


that is used to estimate an unknown
parameter.

X is the point estimate of m


s 2 is the point estimate of s 2

Interval Estimates:
Confidence Interval
An interval in which the true value of the
parameter falls with some level of
confidence.
n Example:
A 95% confidence interval for X means
that we are 95% confident that the value of
m lies in that interval.
n

Confidence Interval for m


n

Lets start off by building a confidence interval


for m when the population at hand is normally
distributed and s is known.
If x is the mean of a random sample of size n
2
s
from a population with known variance
,a
(1 - a )100% confidence interval for m is given by
x - za
2

s
n

< m < x + za
2

s
n

where z a is the z-value leaving an area of


2

the right.

a
2

to

Cases in building C.I. for

s known s unknown,
normal
population,
n 30
s unknown,
n < 30
Confidence
intervals

x za
2

s
n

x za
2

s
n

x ta
2

s
n

Example 5.2.1.2
Measurements of the weights of a random
sample of 200 containers made by a
certain machine showed a mean of 0.21
kilograms and it is known that s = 0.002
kilograms. Find the 95% confidence
interval for the mean weight of all the
containers.

Possible interpretations from a C.I.


n

What does the interval signify?

What is the margin of error?

What is the maximum error that can occur in


using the sample statistic to estimate the
population parameter?

Example
The average calcium contained in 36
samples taken from different locations is
found to be 1.6 grams per millilitre. Find
the 90% confidence interval for the mean
calcium contained in the river. Assume that
the samples standard deviation is 0.2.

What happens when the level of confidence


increases?

Example 5.2.1.5
A machine produces containers that are
cylindrical in shape. Nine containers are
randomly chosen and the diameters are
10.01, 9.97, 10.03, 10.04, 9.99, 9.98, 9.99,
10.01, and 10.03 centimetres. Find a 99%
confidence interval for the mean diameter
of containers from this machine, assuming
an approximate normal distribution.

The C.I.s built so far are two-sided C.I.s.


What if youre interested in building a one-sided
C.I.?
If x is the mean of a random sample of size n
from a population with known variance s 2, then
a (1 - a )100% one-sided confidence interval for m
is given by
s
x - za
<m<
n
More specifically, the one-sided c.i. given above
is bounded from below. Thus, x - za s is
n
known as the lower bound for m .

Confidence interval for p


In a large-sample (n 30), if p is the
proportion of successes in a random
sample of size n, and q = 1 - p , an
approximate (1 - a )100% confidence interval
for the binomial parameter p is given by
p - z a
2

pq
< p < p + z a
n
2

pq
n

where z a is the z-value leaving an area of


2
a to the right.
2

Example 5.2.1.6
In a random sample of n = 600 families
owning television sets in a city, it is found
that x = 240 subscribed to ASTRO. Find a
95% confidence interval for the actual
proportion of families in this city who
subscribe to ASTRO.

Confidence interval for s2


If s 2 is the variance of a random sample of
size n from a normal population, a (1 - a )100%
2
s
confidence interval for is given by

(n - 1)s 2
c 2a
2

2
(
n
1
)
s
< s2 <
c2 a
1-

where c and c1- a are c2-values with u = n -1


2
degrees of freedom, leaving areas of a
2
a
and 1 - , respectively, to the right.
2
a
2

Example 5.2.1.7
The following are the weights, in grams, of
10 packages of sugar packed by a worker:
454, 451, 458, 450, 451, 459, 458, 459,
452 and 450.
Find a 95% confidence interval for the
variance of all such packages of sugar
packed by this worker, assuming a normal
population.

Hypothesis Testing
n

A formal rule that tells us if a new


procedure is better than an existing one.

Testing involves steps:


Stating

the statistical hypothesis


Calculating the test statistic
Indicating the rejection region
Making the decision

Stating the statistical hypothesis


n

Statements regarding certain notions or


feelings that we might have about the
population parameters.

Divided into the null hypothesis ( H 0) and


alternative hypothesis ( H1 ).
Null

hypothesis, H0: A claim (or statement) about a


population parameter that is assumed to be true until
it is declared false.
Alternative hypothesis, H1: The hypothesis that we
will accept if we decide to reject the null hypothesis.

Example 5.2.2.1
Suppose a manufacturer observes that the
existing procedure gives about 4%
defective products. The engineer would
like to implement a new procedure to
reduce the number of defective products.
It was agreed that n=100 products would
be produced using the new procedure. Let
X equal the number of these 100 products
that are defective. State H0 and H1.

Calculating the test statistic


n

A value computed from sample data.

Suppose from the previous example, we


find that out of 100 products, only 3
products are defective under the new
procedure. Thus the test statistic is 3.

Indicating the rejection region


The values of the test statistic that will
imply rejection of the null hypothesis.
n In the previous example, we would like to
reject H0 and accept H1, so that the
number of defective products is reduced.
Since a sample of 100 is taken, it is
reasonable to reject H0 ,if X<4. If X 4 ,
then we accept H0.
n Thus, X<4 is the rejection region (critical
region).
n

Making the decision


H0 is rejected if the test statistic falls in the
rejection region.
n H0 is accepted if the test statistic falls in
the acceptance region.
n In the previous example, since the test
statistic falls in the rejection region, H0 is
rejected (statistical conclusion). What is
the layman conclusion that should follow?
n

Errors in hypothesis testing


We accept the new procedure as an
improvement when in fact, it is not.
n We reject the new procedure as an
improvement, when in fact, it is.
n Categorized as type I and type II errors
respectively.
n

P(type I error)
= P(reject H0 when H0 is true) = a.
P(type II error)
= P(accept H0 when H0 is false) = b.
n

Calculate the type I and type II errors for


the previous example.

Possible situations when testing


hypothesis
H0 true

H0 false

Accept H0

Correct
decision

type II error

Reject H0

type I error

Correct
decision

In hypothesis testing, the null and


alternative hypothesis can be stated in the
following manner:
TWO SIDED

ONE SIDED
(LEFT SIDE)

ONE SIDED
(RIGHT
SIDE)

Symbol in Ho

Symbol in H1

<

>

Rejection
region

Both tails

Left tail

Right tail

Examples on stating hypothesis:


A manufacturer of a certain brand of rice cereal
claims that the average saturated fat content
does not exceed 1.5 grams. State the null and
alternative hypothesis to be used in testing this
claim.
A real estate agent claims that 60% of all private
residences being built today are 3-bedroom
homes. State the null and alternative hypothesis
to test if there is a change in this claim.

Test of hypothesis for population mean, m.


(for normal population and s known)
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:

m=m0
m>m0
X - m0
z=
s
n

z>za

m=m0
m<m0

m=m0
mm0

X - m0
X - m0
z=
z=
s
s
n
n

z<-za

|z|>za/2

Reject H0 if test statistic falls in


critical region.

Test of hypothesis for population mean, m.


(for large sample and s unknown)
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:

m=m0
m>m0
X - m0
z=
s
n

z>za

m=m0
m<m0

m=m0
mm0

X - m0
X - m0
z=
z=
s
s
n
n

z<-za

|z|>za/2

Reject H0 if test statistic falls in


critical region.

Test of hypothesis for population mean, m.


(for small sample (n<30) and s unknown)
Hypothesis,
H0:
H1:
Test statistic:
Critical region:
Decision rule:

m=m0
m>m0
X - m0
t=
s
n

t>ta

m=m0
m<m0

m=m0
mm0

X - m0
X - m0
t=
t=
s
s
n
n

t<-ta

|t|>ta/2

Reject H0 if test statistic falls in


critical region.

Test of hypothesis for population mean, m.


(for normal population and s known)
Hypothesis,
H0:
H1:
p-value:

Decision rule:

m=m0
m>m0

m=m0
m<m0

X - m0
X -m
P Z >
P Z < - s 0
s

m=m0
mm0

X - m0
2 P Z >
s

Reject H0 if p-value <a.

Test of hypothesis for population mean, m.


(for large sample and s unknown)
Hypothesis,
H0:
H1:
p-value:

Decision rule:

m=m0
m>m0

X - m0
P Z >
s

m=m0
m<m0

X - m0
P Z < s

m=m0
mm0


X - m0
2 P Z >
s


n

Reject H0 if p-value <a.

Test of hypothesis for population mean, m.


(for small sample (n<30) and s unknown)
Hypothesis,
H0:
H1:
p-value:

Decision rule:

m=m0
m>m0

X - m0
P T >
s

m=m0
m<m0

X - m0
P T < s

m=m0
mm0

X - m0

P
T
>
2

s

n

Reject H0 if p-value <a.

Example 5.2.2.6
A random sample of 100 electronic chips
showed an average lifetime of 2.8 years.
Assuming a population standard deviation
of 0.5 years, does this seem to indicate
that the mean lifetime is greater than 2.7
years? Use a 0.05 level of significance.
Run hypothesis testing using both the test
statistic approach and p-value approach.

Example
Suppose that 150 MMU students were
tested for their Intelligent Quotation (IQ).
From the data, the average IQ was 120
with a standard deviation of 11.3. An MMU
professor claims that he knows the overall
students IQ is different from 118. Using a
0.05 level of significance, determine if the
professor is correct by using both the test
statistics as well as the p-value approach.

Example 5.2.2.9
Test the hypothesis that the average
diameter of a certain type of battery
produced by a factory is 10 millimetres if
the diameters of a random sample of 10
batteries are 10.1, 9.8, 10.1, 10.5, 10.1,
9.7, 9.9, 10.4, 10.3 and 9.8 millimetres.
Use a 0.01 level of significance and
assume that the distribution of diameters
is normal. Run hypothesis testing using
both the test statistics and p-value
approach.

Test of hypothesis for population proportion, p


using test statistics approach (for a large
sample).
Hypothesis,
p=p0
H0:
p=p0
p=p0
p<p0
p>p0
H1:
pp0
Test statistic:
Critical region:
Decision rule:

p - p 0
z=
p0q0
n

z>za

, p0=null proportion value,


p =sample proportion

z<-za

|z|>za/2

Reject H0 if test statistic falls in


critical region.

Test of hypothesis for population proportion, p


using p-value approach. (for a large sample).
Hypothesis,
H0:
H1:
p-value:

Decision rule:

p=p0
p>p0

p - p0
P z >
p 0 q0

p=p0
p<p0


P z < - p - p0

p0 q 0

n

p=p0
pp0

2 P z > p - p0

p 0 q0

Reject H0 if p-value <a.

Example 5.2.2.10:
A common medicine for relieving serious
pain is believed to be only 80% effective. A
new medicine is used to a random sample
of 100 adults who were suffering from
serious pain and it shows that 85 received
relief. Is this sufficient evidence to
conclude that the new medicine is superior
to the one commonly prescribed? Use a
0.05 level of significance.

Test of hypothesis for population variance, s2


using test statistics approach.
Hypothesis,
H0:
H1:
Test statistic:
Critical region:

Decision rule:

s2=s02
s2>s02

s2=s02
s2<s02

s2=s02
s2s02

2
(
n
1
)
s
2
, s2=sample variance,
c =
s02=null variance value
s 02

c2>c2a

c2<c21-a c2<c21-a/2
or
c2>c2a/2
Reject H0 if test statistic falls in
critical region.

Example 5.2.2.12
In paper manufacturing, a process is
considered out of control if the standard
deviation of the weight of a piece of paper
exceeds 1.25 grams. A random sample of
20 pieces of papers produced during a
routine check yield a standard deviation of
1.9 grams. At the 0.05 level of
significance, is the paper production
process out of control?

Chi-squared Goodness of Fit test (GOF)

Used to test if a model fits a given


scenario well.
In a given scenario, frequencies are
observed and compared to frequencies
expected by fitting a said model.

Procedures for GOF:


Hypothesis,
H0:
H1:
Test statistic:

Decision rule:

A said model fits a given scenario


well.
A said model does not fit a given
scenario well.
2
(
o
e
)
2
c = i i
ei
i =1
k

,oi =observed frequencies,


ei =expected frequencies,
k =number of classes

Reject H0 if c2>c2a(k-1)

Example 5.3.1.
Tossing a fair dice 180 times yields 26
ones, 32 twos, 25 threes, 24 fours, 35
fives and 38 sixes. Test at the 0.01 level of
significance whether the data obtained
from the experiment has a discrete
uniform distribution.

Example 5.3.2.
Suppose the lifetime (in hours), X for 40 bulbs is
recorded as follows:
Class boundaries (hours)

Observed frequencies

1.5 2.0

2.0 2.5

2.5 3.0

11

3.0 3.5

15

3.5 4.0

4.0 4.5

Test at a 0.01 level of significance if the lifetimes


of the bulbs may be approximated using a normal
distribution with m=3.2 and s=0.5.

Anda mungkin juga menyukai