Anda di halaman 1dari 42

 Setting Expectations

 Calculating Measures of central tendency and variation


 Skewness and kurtosis
 Calculating area under normal curve
 Sorting data
 Histogram
 Pareto Chart
 Scatter diagrams
 Bar and Pie charts
 Using Analysis Toolpak for advanced functions

2
 This is not a training on Six Sigma!!
The training presentation assumes that you are already
aware of Six Sigma concepts, and are looking for ways to
implement the same using MS Excel.
 The training presentation also assumes that you know the
basics of MS Excel, and hence it focuses on some advanced
analytical concepts.
 The excel tips and tools mentioned in this presentation can
be used in multiple phases of the DMAIC order. So, the
presentation does not follow a DMAIC flow of thought.
 The training is based on MS Excel 2007. Improvise a little
when you are using MS Excel 2003.
3
In mathematics, the central tendency of a data set is a measure of the
"middle" or "expected" value of the data set. There are many different
descriptive statistics that can be chosen as a measurement of the
central tendency of the data items. These include mean, the median
and the mode.
Other statistical measures such as the standard deviation and the range
are called measures of spread and describe how spread out the data is.

Measures of • Mean
• Median
Central Tendency • Mode

Measures of • Standard Deviation


• Variance
Spread • Range
4
The arithmetic mean (average) of a list of numbers is the sum of all of
the list divided by the number of items in the list.
To obtain the arithmetic mean from a dataset, use the excel function
“Average”. Click below for the syntax for using the function.

Click for the syntax

Syntax
=AVERAGE(number1,number2,...)

5
A median is described as the number separating the higher half of a
sample, a population, or a probability distribution, from the lower half.
If there is an even number of observations, the median is not unique, so
one often takes the mean of the two middle values.

Click for the syntax

Syntax
=MEDIAN(number1,number2,...)

6
The mode is the value that occurs the most frequently in a data set or a
probability distribution. The mode is not necessarily unique, since the
same maximum frequency may be attained at different values.

Click for the syntax

Syntax
=mode(number1,number2,...)

7
In Statistics, variance is the expected square deviation of a variable or
distribution from its expected value or mean. To obtain variance from a
distribution, excel uses the function “=var”. Click below for the syntax.

Click for the syntax

Syntax
=VAR(number1,number2,...)

8
Standard deviation is a measure of the variability or dispersion of a
statistical population, a data set, or a probability distribution. To
calculate Standard Deviation in an excel worksheet, we use the
function, “=stdev”.

Click for the syntax

Syntax
=STDEV(number1,number2,...)

9
In descriptive statistics, the range is the length of the smallest interval
which contains all the data. It is calculated on excel by subtracting the
Min from the max value of the sample. Click below for the syntax.

Click for the syntax

Syntax
=max(A2:A16)-Min(A2:A16)

10
In probability theory and statistics, skewness is a measure of the
asymmetry of the probability distribution of a real-valued random
variable. It is measured in Six Sigma because, in reality, data points are
always not perfectly symmetric.

Click for the syntax

Syntax
=skew(A2:A16)

11
In probability theory and statistics, kurtosis is a measure of the
"peakedness" of the probability distribution of a real-valued random
variable.

Click for the syntax

Syntax
=kurt(A2:A16)

12
If the mean is 85 days and the standard deviation is 5 days,
what is the yield if the USL is 90 days?
USL
Z = (90 − 85) / 5 = 1 Area under curve to
Y = Pr( x ≤ 90) = Pr( z ≤ 1) right of USL would
be considered %
defective
P(z<1) = P(z>-1) = 1-.15865 =
.8413 Yield ≅ 84.1% Yield

60 70 80 90 100 110 120


D a ys

-7 -6 - -4 -3 -2 - 0 2 3 4 5 6 7
5 1 1
Z-Scale

13
=normdist(x,mean,standarddeviation,cumulative)

14
=normdist(x,mean,standarddeviation,cumulative)

15
=normdist(x,mean,standarddeviation,cumulative)

16
=normdist(x,mean,standarddeviation,cumulative)

17
For a pizza delivery center, the mean of the delivery time is
20 minutes and the standard deviation is 3.5. What is their
target, if the probability of achieving the target is 99.78%?
USL

Yield

Hours
a s

18
=norminv(probability,mean,standarddeviation)

19
=norminv(probability,mean,standarddeviation)

20
=norminv(probability,mean,standarddeviation)

21
 Data in raw form are usually not easy to use
for decision making
 Some type of organization is needed
▪ Table
▪ Graph
 Techniques reviewed here:
 Ordered Array
 Histograms
 Bar charts and pie charts
 Contingency tables
22
A sorted list of data:
 Shows range (min to max)

 Provides some signals about variability


within the range
 May help identify outliers (unusual observations)
 If the data set is large, the ordered array is
less useful

23
 Data in raw form (as
collected):

24, 26, 24, 21, 27, 27, 30, 41,


32, 38

 Data in ordered array from


smallest to largest:

21, 24, 24, 26, 27, 27, 30, 32, 38,


41
24
 A graph of the data in a frequency distribution is
called a histogram
 The class boundaries (or class midpoints) are
shown on the horizontal axis
 the vertical axis is either frequency, relative
frequency, or percentage
 Bars of the appropriate heights are used to
represent the number of observations within
each class

25
Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Histogram : Daily High Tem perature
40 but less than 50 45 4
50 but less than 60 55 2 7 6
6 5
5
Frequency

4
4 3
3 2
2
(No gaps 1 0 0
between 0
bars)
5 15 25 35 45 55 More
26
27
28
2

Choose Histogram

(
Input data range and bin range (bin
range is a cell range containing
the upper class boundaries for
3 each class grouping)
Select Chart Output
and click “OK”

29
30
31
 Scatter Diagrams are used for bivariate
numerical data
 Bivariate data consists of paired observations
taken from two numerical variables

 The Scatter Diagram:


 one variable is measured on the vertical axis and
the other variable is measured on the horizontal
axis

32
1

Select the Insert Menu


tab
2

Select Scatter plot


dropdown and
click on any of
the options. If in
doubt, select the
first option
(scatter with only
markers)

33
Volume Cost per
Cost per Day vs. Production Volume
per day day
23 125 250
26 140
200
Cost per Day

29 146
150
33 160
38 167 100
42 170 50
50 188
0
55 195
0 10 20 30 40 50 60 70
60 200
Volume per Day

34
35
36
Microsoft Excel
descriptive statistics output,
using the house price data:

House Prices:

$2,000,000
500,000
300,000
100,000
100,000

37
 Select
Data Analysis
 Choose Correlation from
the selection menu
 Click OK . . .

38
 Input data range and select
appropriate options
 Click OK to get output

39
 Select the
input range s
from the data

 Select the
residuals
pattern. If you
are not sure,
just select
line fit plots.

40
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price = 98.24833 + 0.10977 (square feet)
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

41
42

Anda mungkin juga menyukai