Inference For Two Populations

Inference for two Populations
lecture 14
Prepared by:
Nur Afny C. Andryani
Source: Copyright @2005 Brooks/Cole, a division of Thomson Learning, Inc, UoW Lecture handout, text
book Managerial Statistics
Comparing Two Populations…
Previously we looked at techniques to estimate and test
parameters for one population:
Population Mean , Population Variance , and
Population Proportion p
We will still consider these parameters when we are looking at

two populations, however our interest will now be:
 The difference between two means.
 The ratio of two variances.
 The difference between two proportions.
Difference of Two Means…
In order to test and estimate the difference between two
population means, we draw random samples from each of two
populations. Initially, we will consider independent samples,
that is, samples that are completely unrelated to one another.
Population 1
Sample, size: n1
Parameters: Statistics:
(Likewise, we consider for Population 2)

Difference of Two Means…
In order to test and estimate the difference between two
population means, we draw random samples from each of two
populations. Initially, we will consider independent samples,
that is, samples that are completely unrelated to one another.
Because we are compare two population means, we use the

statistic:
Sampling Distribution of
1. is normally distributed if the original populations are
normal –or– approximately normal if the populations are
nonnormal and the sample sizes are large (n1, n2 > 30)
2. The expected value of is
3. The variance of is
and the standard error is:

Making Inferences About
Since is normally distributed if the original
populations are normal –or– approximately normal if the
populations are nonnormal and the sample sizes are large (n1,
n2 > 30), then:
is a standard normal (or approximately normal) random

variable. We could use this to build test statistics or confidence
interval estimators for
Making Inferences About
…except that, in practice, the z statistic is rarely used since the
population variances are unknown.
??
Instead we use a t-statistic. We consider two cases for the

unknown population variances: when we believe they are equal
and conversely when they are not equal.
When are variances equal?
How do we know when the population variances are equal?
Since the population variances are unknown, we can’t know for

certain whether they’re equal, but we can examine the sample
variances and informally judge their relative values to
determine whether we can assume that the population
variances are equal or not.
Test Statistic for (equal variances)
1) Calculate – the pooled variance estimator as…
2) …and use it here:
degrees of freedom
CI Estimator for (equal variances)
The confidence interval estimator for when the
population variances are equal is given by:
pooled variance estimator degrees of freedom

Test Statistic for (unequal variances)
The test statistic for when the population variances are

unequal is given by:
degrees of freedom
Likewise, the confidence interval estimator is:

Which case to use?
Which case to use? Equal variance or unequal variance?
Whenever there is insufficient evidence that the variances are
unequal, it is preferable to perform the
equal variances t-test.
This is so, because for any two given samples:
The number of degrees of The number of degrees

freedom for the equal
variances case
≥ of freedom for the unequal
variances case
Larger numbers of degrees of ≥

freedom have the same effect as
having larger sample sizes
Example 13.1…
Do people who eat high-fiber cereal for breakfast consume, on
average, fewer calories for lunch than people who do not eat
high-fiber cereal for breakfast?
What are we trying to show? What is our research hypothesis?
The mean caloric intake of high fiber cereal eaters ( )

is less than that of non-consumers ( ), i.e. is ?
IDENTIFY
Example 13.1…
The mean caloric intake of high fiber cereal eaters ( )
is less than that of non-consumers ( ), translates to:
(i.e. )
Thus, H1: Phrase H0 & H1 as a

“difference of means”
Hence our null hypothesis becomes:
H 0:
Example 13.1…
A sample of 150 people was randomly drawn. Each person was
identified as a consumer or a non-consumer of high-fiber
cereal. For each person the number of calories consumed at
lunch was recorded. The data: Independent Pop’ns;
Either you eat high fiber
cereal or you don’t
n1+n2=150
Recall H1:
There is reason to believe

the population variances
are unequal…
COMPUTE
Example 13.1…
Thus, our test statistic is:
The number of degrees of freedom is:
Hence the rejection region is…

COMPUTE
Example 13.1…
Our rejection region:
Compare
Our test statistic:
Since our test statistic (-2.09) is less than our critical value of t
(-1.658), we reject H0 in favor of H1 — that is, there is
sufficient evidence to support the claim that high fiber cereal
eaters consume less calories at lunch.
INTERPRET
Example 13.1…
…however, we still need to be able to interpret the Excel
output:
Compare…
…or look at p-value
Beware! Excel gives a right tail critical value!

i.e. 1.6573 vs. –1.6573 !!
Confidence Interval…
Suppose we wanted to compute a 95% confidence interval
estimate of the difference between mean caloric intake for
consumers and non-consumers of high-fiber cereals…
That is, we estimate that non-consumers of high fiber cereal eat

between 1.56 and 56.86 more calories than consumers.
IDENTIFY
Example 13.2…
Two methods are being tested for assembling office chairs.
Assembly times are recorded (25 times for each method). At a
5% significance level, do the assembly times for the two
methods differ?
That is, H1:
Hence, our null hypothesis becomes: H0:
Reminder: since our null hypothesis is a “not equals” type, it is a two-tailed

test.
COMPUTE
Example 13.2…
The assembly times for each of the two methods are recorded
and preliminary data is prepared…
The sample variances are similar, hence we will assume that the
population variances are equal…
COMPUTE
Example 13.2…
Recall, we are doing a two-tailed test, hence the rejection
region will be:
The number of degrees of freedom is:
Hence our critical values of t (and our rejection region)

becomes:
COMPUTE
Example 13.2…
In order to calculate our t-statistic, we need to first calculate
the pooled variance estimator, followed by the t-statistic…
INTERPRET
Example 13.2…
Since our calculated t-statistic does not fall into the rejection region,
we cannot reject H0 in favor of H1, that is, there is not sufficient
evidence to infer that the mean assembly times differ.
Confidence Interval…
We can compute a 95% confidence interval estimate for the
difference in mean assembly times as:
That is, we estimate the mean difference between the two

assembly methods between –.36 and .96 minutes. Note: zero
is included in this confidence interval…
Identifying Factors I…
Factors that identify the equal-variances t-test and estimator of
:
Identifying Factors II…
Factors that identify the unequal-variances t-test and estimator
of :
Matched Pairs Experiment…
Previously when comparing two populations, we examined
independent samples.
If, however, an observation in one sample is matched with an

observation in a second sample, this is called a matched pairs
experiment.
To help understand this concept, let’s consider example 13.4

Example 13.4…
Is there a difference between starting salaries offered to MBA
grads going into Finance vs. Marketing careers? More precisely,
are Finance majors offered higher salaries than Marketing
majors?
In this experiment, MBAs are grouped by their GPA into 25

groups. Students from the same group (but with different
majors) were selected and their highest salary offer recorded.
Here’s how the data looks…

Example 13.4…
The numbers in black are the original starting salary data; the
number in blue were calculated.
although a student is either in Finance OR

in Marketing (i.e. independent), that the
data is grouped in this fashion makes it a
matched pairs experiment (i.e. the two
students in group #1 are ‘matched’ by
their GPA range
the difference of the means is equal to the mean of the differences, hence
we will consider the “mean of the paired differences” as our parameter of interest:
IDENTIFY
Example 13.4…
Do Finance majors have higher salary offers than Marketing
majors?
Since:
We want to research this hypothesis: H1:
and our null hypothesis becomes H0: )

Test Statistic for
The test statistic for the mean of the population of
differences ( ) is:
which is Student t distributed with nD–1 degrees of freedom,

provided that the differences are normally distributed.
Thus our rejection region becomes:

COMPUTE
Example 13.4…
From the data, we calculate…
…which in turn we use

for our t-statistic…
…which we compare to our critical value of t:

INTERPRET
Example 13.4…
Since our calculated value of t (3.81) is greater than our critical
value of t (1.711), it falls in the rejection region, hence we
reject H0 in favor of H1; that is, there is overwhelming evidence
(since the p-value = .0004) that Finance majors do obtain
higher starting salary offers than their peers in Marketing.
Compare…
Confidence Interval Estimator for
We can derive the confidence interval estimator for
algebraically as:
In the previous example, what is the 95% confidence interval

estimate of the mean difference in salary offers between the
two business majors?
That is, the mean of the population differences is between

LCL=2,321 and UCL=7,809 dollars.
Identifying Factors…
Factors that identify the t-test and estimator of :
Inference about the ratio of two variances
So far we’ve looked at comparing measures of central location, namely
the mean of two populations.
When looking at two population variances, we consider the ratio of the

variances, i.e. the parameter of interest to us is:
The sampling statistic: is F distributed with

degrees of freedom.
Inference about the ratio of two
variances
Our null hypothesis is always:
H 0:
(i.e. the variances of the two populations will be equal, hence

their ratio will be one)
Therefore, our statistic simplifies to:

IDENTIFY
Example 13.6…
In example 13.1, we looked at the variances of the samples of
people who consumed high fiber cereal and those who did not
and assumed they were not equal. We can use the ideas just
developed to test if this is in fact the case.
We want to show: H1:

(the variances are not equal to each other)
Hence we have our null hypothesis: H0:

CALCULATE
Example 13.6…
Since our research hypothesis is: H1:
We are doing a two-tailed test, and our rejection region is:
F
CALCULATE
Example 13.6…
Our test statistic is:
.58 1.61 F
Hence there is sufficient evidence to reject the null hypothesis
in favor of the alternative; that is, there is a difference in the
variance between the two populations.
CALCULATE
Example 13.6…
If we wanted to determine the 95% confidence interval
estimate of the ratio of the two population variances in
Example 13.1, we would proceed as follows…
The confidence interval estimator for is:
CALCULATE
Example 13.6…
The 95% confidence interval estimate of the ratio of the two
population variances in Example 13.1 is:
That is, we estimate that lies between .2388 and .6614

Note that one (1.00) is not within this interval…
Identifying Factors
Factors that identify the F-test and estimator of :
Difference Between Two Population
Proportions
We will now look at procedures for drawing inferences about
the difference between populations whose data are nominal
(i.e. categorical).
As mentioned previously, with nominal data, calculate

proportions of occurrences of each type of outcome. Thus, the
parameter to be tested and estimated in this section is the
difference between two population proportions: p1–p2.
Statistic and Sampling Distribution…
To draw inferences about the the parameter p1–p2, we take
samples of population, calculate the sample proportions and
look at their difference.
is an unbiased estimator for p1–p2.

x1 successes in a
sample of size n1
from population 1
Sampling Distribution
The statistic is approximately normally distributed
if the sample sizes are large enough so that:
Since its “approximately normal” we can describe the normal

distribution in terms of mean and variance…
…hence this z-variable will also be approximately standard

normally distributed:
Testing and Estimating p1–p2…
Because the population proportions (p1 & p2) are unknown,
the standard error:
is unknown. Thus, we have two different estimators for the

standard error of , which depend upon the null
hypothesis. We’ll look at these cases on the next slide…
Test Statistic for p1–p2…
There are two cases to consider…
IDENTIFY
Example 13.8…
A consumer packaged goods (CPG) company is test marketing
two new versions of soap packaging. Version one (bright colors)
is distributed in one supermarket, while version two (simple
colors) is in another. Since the first version is more expensive,
it must outsell the other design, that is its market share, p1,
must be greater than that of the other soap package design, i.e.
p 2.
That is, we want to know, is p1 > p2? or, using the language of
statistics:
H1: (p1–p2) > 0
Hence our null hypothesis will be H0: (p1–p2) = 0 [case 1]
IDENTIFY
Example 13.8…
Here is the summary data…
Our null hypothesis is H0: (p1–p2) = 0, i.e. is a “case 1” type

problem, hence we need to calculate the pooled proportion:
CALCULATE
Example 13.8…
At a 5% significance level, our rejection region is:
Compare…
The value of our z-statistic is…
Since 2.90 > 1.645, we reject H0 in favor of H1, that is, there is
enough evidence to infer that the brightly colored design is
more popular than the simple design.
IDENTIFY
Example 13.9…
Suppose in our test marketing of soap packages scenario that
instead of just a difference between the two package versions,
the brightly colored design had to outsell the simple design by
at least 3%
Our research hypothesis now becomes:

H1: (p1–p2) > .03
And so our null hypothesis is: H0: (p1–p2) = .03
Since the r.h.s. of the H0 equation is

not zero, it’s a “case 2” type problem
IDENTIFY
Example 13.9…
Same summary data as before:
Since this is a “case 2” type problem, we don’t need to calculate

the pooled proportion, we can go straight to z:
INTERPRET
Example 13.9…
Since our calculated z-statistic (1.15) does not fall into our
rejection region ,
there is not enough evidence to infer that the brightly colored

design outsells the other design by 3% or more.
Confidence Intervals…
The confidence interval estimator for p1–p2 is given by:
and as you may suspect, its valid when…

COMPUTE
Example 13.10…
Create a 95% confidence interval for the difference between
the two proportions of packaged soap sales from Ex. 13.8:
Identifying Factors…
Factors that identify the z-test and estimator for p1–p2

Inference For Two Populations

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Inference For Two Populations

Diunggah oleh

Hak Cipta:

Format Tersedia

Inference for two Populations

We will still consider these parameters when we are looking at

(Likewise, we consider for Population 2)

Because we are compare two population means, we use the

2. The expected value of is

and the standard error is:

is a standard normal (or approximately normal) random

Instead we use a t-statistic. We consider two cases for the

Since the population variances are unknown, we can’t know for

2) …and use it here:

pooled variance estimator degrees of freedom

The test statistic for when the population variances are

Likewise, the confidence interval estimator is:

The number of degrees of The number of degrees

Larger numbers of degrees of ≥

What are we trying to show? What is our research hypothesis?

The mean caloric intake of high fiber cereal eaters ( )

Thus, H1: Phrase H0 & H1 as a

Hence our null hypothesis becomes:

There is reason to believe

The number of degrees of freedom is:

Hence the rejection region is…

…or look at p-value

Beware! Excel gives a right tail critical value!

That is, we estimate that non-consumers of high fiber cereal eat

That is, H1:

Hence, our null hypothesis becomes: H0:

Reminder: since our null hypothesis is a “not equals” type, it is a two-tailed

The number of degrees of freedom is:

Hence our critical values of t (and our rejection region)

That is, we estimate the mean difference between the two

If, however, an observation in one sample is matched with an

To help understand this concept, let’s consider example 13.4

In this experiment, MBAs are grouped by their GPA into 25

Here’s how the data looks…

although a student is either in Finance OR

We want to research this hypothesis: H1:

and our null hypothesis becomes H0: )

which is Student t distributed with nD–1 degrees of freedom,

Thus our rejection region becomes:

…which in turn we use

…which we compare to our critical value of t:

In the previous example, what is the 95% confidence interval

That is, the mean of the population differences is between

When looking at two population variances, we consider the ratio of the

The sampling statistic: is F distributed with

(i.e. the variances of the two populations will be equal, hence

Therefore, our statistic simplifies to:

We want to show: H1:

Hence we have our null hypothesis: H0:

That is, we estimate that lies between .2388 and .6614

As mentioned previously, with nominal data, calculate

is an unbiased estimator for p1–p2.

Since its “approximately normal” we can describe the normal

…hence this z-variable will also be approximately standard

is unknown. Thus, we have two different estimators for the

Our null hypothesis is H0: (p1–p2) = 0, i.e. is a “case 1” type

The value of our z-statistic is…

Our research hypothesis now becomes:

Since the r.h.s. of the H0 equation is

Since this is a “case 2” type problem, we don’t need to calculate

there is not enough evidence to infer that the brightly colored

and as you may suspect, its valid when…