Anda di halaman 1dari 29

1

Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
How to make sure that your true
process population dynamics are
represented by your sample data.
How many data
points do I
need?
Sampling
Concepts
2
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Objectives
Be able to select the proper sampling
strategy.
Be able to calculate the proper sample
size for a given confidence level and
confidence interval.
Be able to define confidence level, power
factor, alpha risk and beta risk.
Be able to calculate the proper sample
size for a hypothesis test or DOE.
3
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
How Much Data Do You Need?
It depends on what questions you are trying to answer?
For example
Determining Process Capability (Baselining)
Your focus should be to collect enough baseline data to capture an entire
iteration or cycle of the process.
An iteration should account for the different types of
variation seen within the process.
Cycles, shifts, seasonal trends, product types,
volume ranges, cycle times, demographic mixes, etc.
If historical data are not available, a data collection plan should be
instituted to collect the appropriate data.
Hypothesis Testing (comparing means/variances)
Later in this section we will show you that your focus should be on other
statistical characteristics of the sample, such as, mean, variance, risk and
the level of confidence desired for seeing differences.
4
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sampling Strategy Review
What must a sample be?
Random.
Unbiased.
Representative.
What kind of sampling can we do?
PopulationItems exist and their
characteristics are stable.
ProcessItems continue to be produced and
their characteristics may change as the
process varies.
5
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sampling Strategy Review
Sampling Strategies
Random Sample - Each unit has an equal probability of
being selected in a sample (typically used for population
studies).
Rational Subgroup Each unit is collected at point A in a
process everyn
th
hour. Usually multiple sequential units are
collected (typically used for process studies).
Stratified Random Sampling - Randomly sample within a
stratified category or group. Sample sizes for each group
are generally proportional to the relative size of the group.
Systematic Sampling - Sample every n
th
one (Ex: collecting
every 4
th
unit).
6
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
XXXX
XXX
XXXX
XXXX
XXXX
XXXX
XX
XXXX
XXXX
XXXX
XXXX
XX
XXXX
XXXX
XXXX
XXXX
X
Population
Sample
XXXX
XXX
Sampling Strategy New
Cluster Sampling
The population is composed of small groups called clusters.
There is very little variation in the demographics of the clusters.
Data is gathered in detail within one (or a few) cluster(s).
i.e. all the tellers in one representative banking center or all
call representatives in one representative call center.
7
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sampling Exercise 1
1. We want to monitor our item processing associates on a
regular basis to see if errors are going up.
2. Without spending a lot of money, we want to know whether
the people in Charlotte prefer Bank of America or
Wachovia.
3. A DFSS team needs to thoroughly understand our affluent
urban customers to better develop products for their needs.
4. For planning purposes our Supply Chain Management
needs to know how long it takes to enter into contracts. We
deal with many types of suppliers in many different areas of
the country.
5. We want to know how accurate our bills are to our
corporate customers. It will take about 15 minutes to
review each bill for accuracy. Bills are issued monthly, but
evenly throughout the month (I.e. not all end-of-month
billing).
Random
Rational
Subgroup
Stratified
Random
Systematic
Cluster
Match the sampling situation with the best sampling strategy.
8
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
How Much Data Do You Need?
How many actual data points do you need to collect for
your sample? Green belt class introduced the following
sample size formulas:
2
s 96 . 1
n

n = sample size
s = standard deviation
p = proportion defective
= precision, , of the
estimate at 95% confidence
) p 1 ( p
96 . 1
n
2

Continuous data form


Discrete data form
9
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sample Size Formula Examples
Continuous Data Example:
We want to be able to know at
what level our call center process
is running. If our process has a
standard deviation = 5.0, what
size sample will we need to be
able to estimate the mean within
1 minute?
97
1
5 x 96 . 1
n
2
=

=
Discrete Data Example:
Based on historical data, our
defect rate (p) has been running
at approximately 0.05. We want to
sample the process occasionally
and be able to see if it has gone
up or down by 0.02. How large of
a sample do we need to take?
( ) 457 ) 05 . 1 ( 05 .
02 .
96 . 1
n
2
=

=
Other Common Confidence Factors
For 85% confidence, Z = 1.439
For 90% confidence, Z = 1.645
For 99% confidence, Z = 2.575
Other Common Confidence Factors
For 85% confidence, Z = 1.439
For 90% confidence, Z = 1.645
For 99% confidence, Z = 2.575
10
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sample Size Calculator
Open the Excel file called: SamplCal4.x
This application can be used effectively for determining initial sample sizes required for
process studies or descriptive statistics of process information. It should not be used
for hypothesis testing. Minitab is the recommended sample size calculator for
hypothesis testing or experimental studies.
Estimated Sample Sizes for Continuous Data at 99%, 95% and 90% Confidence Levels
Enter Population Size Here*
1,000,000 Precision Sample Size Sample Size Sample Size
(d)
99% 95% 90%
1 167 97 68
Enter Estimated Standard Deviation Here 5 2 42 25 17
(If unknown, use 1/6 of the known range of the data) 3 19 11 8
4 11 7 5
* For process sampling use the total number of 5 7 4 3
items produced in the time period you wish 6 5 3 2
to characterize. The population size is used 7 4 2 2
to adjust sample size with the Finite Population 8 3 2 2
Correction Factor 9 3 2 1
10 2 1 1
11 2 1 1
12 2 1 1
13 1 1 1
14 1 1 1
15 1 1 1
11
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sampling Exercise 2
Each month we want to monitor the errors made on
deposits in each region. We want to know where we are
within 1%. We believe the error rate is 11.5%. How many
samples should we take in each region, each month?
After sampling for a few months, we have found that the
error rate is only 8.1%. Additionally, our boss wants us to
report it in Sigma Level to the closest tenth. At this level
we calculated that 0.1 Sigma Level is about 1.4%, so that is
the most precision we will need. Now how many samples
should we take?
12
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sample Sizes for
Hypothesis Testing
hisis
thenew
stuff
with
initab
Right, this
is the new
stuff with
Minitab.
13
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sample Size Concepts
Smaller sample sizes:
Less Cost
Quicker data collection
Higher risks
- chance of missing an
important effect (false
negative)
- chance of declaring an
effect important when
it is not (false positive)
Wider confidence
intervals
Larger sample sizes:
Higher costs
Longer time to get data
Lower risks (but not zero)
- smaller real effects and
more likely to be declared
significant
Tighter confidence intervals
14
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Risks and Power
Alpha ():
The risk associated with finding that something is
significant when in reality it is not. The risk of chasing an
unimportant X. The risk of doing the wrong thing.
Beta ():
The risk associated with stating that an input or process is
not different when it is. This is the risk of missing an
important X. The risk of doing nothing.
Power (1-):
The chance of correctly rejecting an X when indeed it should
be rejected. Power is the likelihood that you will identify a
significant difference when one exists.
15
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Alpha/Beta Risks
Alpha, , is the risk of finding a difference when there really isnt one.
Beta, , is the risk of not not finding a difference when there really is one.
Truth is:
Truth is:
Test
says:
Test
says:
H
0
H
0 H
a
H
a
H
0
H
0
H
a
H
a
Type II
Error

Type I
Error

Correct Correct
Decision Decision
Correct Correct
Decision Decision
Risk also called:
Type II error
Risk - also called:
Type I error
A fire alarm sounds,
but there is no fire.
Or we deny a loan to
a credit worthy
person.
The fire alarm is silent, but
there is a fire. Or we approve
a loan for a non-credit worthy
person.
16
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Alpha Risk Graphically
Truth is: Both samples are from the same population.
If X
2
here, conclude
one population;
correct!
Sample 1
Risk Area
If X
2
here, conclude
two populations;
Type 1 Error.
17
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Beta Risk Graphically
Truth is: Samples are from different populations.
Power (1-) is the
probability of detecting
the difference, , and is
represented by the
area in Population 2
less the beta risk area.

If X
2
here, conclude
one population;
Type 2 Error.
Sample 1
If X
2
here, conclude
two populations;
correct!
Population 2
Risk Area Risk Area
18
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Power (1-) is the
probability of detecting
the difference, , and is
represented by the
area in Population 2
less the beta risk area.
Beta Risk Graphically
Truth is: Samples are from different populations.
If X
2
here, conclude
one population;
Type 2 Error.
If X
2
here, conclude
two populations;
correct!
Sample 1
Population 2
Risk Area
Risk Area

19
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sampling Terminology
n: The number of units making up the sample size. May be expressed
differently depending on the situation. For a DOE, n may be the number
of experimental runs. In a two sample t-test, n could represent the
number of observations for each group.
: (alpha) Your chance of a false positive, which is the p-value at which
you start calling things statistically significant.
: (beta) Your chance of a false negative.
: (delta) The size of the real effect you want to be sure to detect if in fact it
is there. Often expressed as a multiple of .
: (sigma) The standard deviation of the noise variation when factors are
held fixed.
Power: Your chance of detecting a real effect, i.e., declaring it to be
statistically significant. You want this high. Power = 1-
20
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Using Minitab to Determine
Sample Size
Stat>Power and Sample Size
Minitab can calculate power or sample sizes for:
1-sample t & 2-sample t
1 Proportion & 2 Proportions
One-way ANOVA
2-level factorial designs
Enter and , plus two of n, , or (1-), and
Minitab will solve for the third!
Enter and , plus two of n, , or (1-), and
Minitab will solve for the third!
21
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
General Sample Size Formula
The sample size formula presented earlier in this module is
actually a simplified version of the general formula that is given
below:
data discrete for ) p 1 ( p
) Z Z (
n
data continuous for
s ) Z Z (
n
2
2

+
=

+
=
/2
/2



Note: The formula used earlier assumed Z

= 0 or = 0.50. For
a hypothesis test, a = 0.5 implies a power of 50%. You would
have a 50% chance of seeing a significant difference if it were
actually there.
Note: The formula used earlier assumed Z

= 0 or = 0.50. For
a hypothesis test, a = 0.5 implies a power of 50%. You would
have a 50% chance of seeing a significant difference if it were
actually there.
22
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sample Size Example 1
We want to see if the average time to process a second
mortgage application is the same for two banking centers. A
2-sample t-test was selected. Our best (planning) estimate
for the average time is around 15 days with a standard
deviation, = 2 days.
The sample size must be large enough to provide a 95%
chance of detecting a difference (if it exists) in the average
processing times, as small as 3 days (because a 3 day
difference is of practical significance to us). Using an
alpha risk of 0.05, what sample size would you recommend?
23
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Minitab
1. Fill in two, leaving one empty.
2. Put in Standard
Deviation.
3. Click Options.
4. Choose H
a
and enter .
5. OK.
6. OK.
Stat>Power and Sample Size>2-Sample t
24
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Minitab
Power and Sample Size
2-Sample t Test
Testing mean 1 = mean 2 (versus not =)
Calculating power for mean 1 = mean 2 + difference
Alpha = 0.05 Sigma = 2
Sample Target Actual
Difference Size Power Power
3 13 0.9500 0.9561
Power and Sample Size
2-Sample t Test
Testing mean 1 = mean 2 (versus not =)
Calculating power for mean 1 = mean 2 + difference
Alpha = 0.05 Sigma = 2
Sample Target Actual
Difference Size Power Power
3 13 0.9500 0.9561
25
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Comparison of proportions:
Card Services wants to compare the default rate of a private
labeled credit card (PLC) against Visa to see if the Visa default rate
is really larger. PLC has claimed a default rate of 1 percent or less.
The default rate for Visa is roughly 1.5 percent. We want to be at
least 90 percent sure (power) to find a significant difference if it
exists and are willing to take a 5 percent risk ().
What is the required sample size (for each credit card)?
What is your false negative risk ()?
Sampling Exercise 3
26
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Designed experiment:
A Six Sigma team want to conduct a 2
3
factorial with replicates.
Assuming an = .05, the team needs an 80 percent assurance
of detecting at least a 2 percent difference (effect) on Y .
Typical experimental variation with factors held fixed
(determined from an earlier pilot study) is about 1.5 percent ( =
1.5%).
How many replicates will you have to run to achieve
the 80% detection assurance?
Sampling Exercise 4
27
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Case power n
1
2
3
4
5
6
Constant Constant
Constant
Constant
Constant Constant
Constant Constant
Constant Constant
Constant Constant
?
?
?
?
?
?
Fill in the relationships chart (hint: refer to the general
formula to determining the sample size).
Sampling Exercise 5
28
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Sampling Summary
Sample size depends on:
What level of risk youre willing to take.
What size difference you want to detect.
How powerful you want the test to be.
Before collecting data, you should think about the sampling
strategy and sample size requirements to ensure that you
have an appropriate amount of data for drawing
conclusions.
Choosing the right sample size allows us to better manage
our risks of making a wrong decision or missing an
important factor.
Sample size determination should include practical
considerations, like economics, as well.
29
Thesematerialscontaininformationthat isproprietary andconfidential to Bank of America. Thesematerialsshall not beduplicated. 2005Bank of America. All rightsreserved.
J anuary 3 2005ver.4.4- ActionLegal Copy Service.
Black Belt Key Learnings
Does this tool have an application to my current project?
________________________________________________________________________
________________________________________________________________________
This tool can help me answer the following questions:
________________________________________________________________________
________________________________________________________________________
What are the key learnings about this tool and/or subject?
________________________________________________________________________
________________________________________________________________________
How comfortable will I be in training my team on this tool?
________________________________________________________________________
________________________________________________________________________

Anda mungkin juga menyukai