Neil W. Polhemus, CTO, StatPoint Technologies, Inc. George H. Dyson, Director of Six Sigma Services
Copyright 2010 by StatPoint Technologies, Inc.
Businesses must create value. This output must be greater than the inputs needed to produce it. If the output meets the customers needs, the business is effective. If the business creates added value with minimum resources, the business is efficient.
The TheRole Roleof ofSix SixSigma Sigmais isto toHelp Helpa aBusiness BusinessProduce Produce the theMaximum MaximumValue ValueWhile WhileUsing UsingMinimum MinimumResources Resources
(Pyzdek 2003)
DATA!!!
Data drives analysis. Analysis uses statistical models & tools of all kinds. To extract meaningful information To uncover signals in the presence of noise To understand the past To monitor the present To forecast the future This understanding results in reduced cost and saving
Examples
Product comparisons Survey analysis Distribution fitting Comparison of multiple samples Outlier detection Curve fitting Response surface modeling Time series forecasting Event rate modeling Interactive maps
5
Price (dollars) Road-test score (0 100) Predicted reliability (1 5) Owner satisfaction (1 5) Owner cost (1 5) Safety (1 5) Fuel economy (overall mpg)
Multivariate Visualization
The data consist of 7 quantitative variables plus one categorical factor. Each car is thus a point in 7-dimensional space with one other feature. How can we visualize whats going on? 1. 2-D scatterplot 2. 3-D scatterplot 3. Scatterplot matrix 4. Bubble chart 5. Parallel coordinates plot 6. Star glyphs 7. Chernoff faces 8. Radar/spider plot
2-D Scatterplot
Useful for plotting 2 dimensions.
Plot of MPG vs Price
3-D Scatterplot
Useful for plotting 3 dimensions.
Plot of MPG vs Price and Road Test Class Family Luxury Upscale
35 32 29 MPG 26 23 20 17 24
10
Bubble Chart
Uses size of bubble to illustrate a third dimension.
Bubble Chart for Reliability
11
Scatterplot Matrix
Plots all pairs of variables.
Class Price Family Luxury Upscale Road Test
Reliability
Owner Satisfaction
Owner Cost
Safety
MPG
12
0.8
0.6
0.4
0.2
0
PG Pr ic e Te st el ia bi li t y ty at is fa ct io n Sa fe M w ne rC O oa d os t
R O
13
w ne rS
Star Glyphs
Each case is shown as a polygon with vertices scaled by the variables.
1/Price
Owner Cost
MPG
Safety
Road Test
Owner Satisfaction
Reliability
14
Star Glyphs
Variables are scaled so that larger area is better.
Class Family Luxury Upscale
Nissan Altima
Honda Accord
Hyundai Sonata
Chevrolet Malibu
Ford Fusion
Mercury Milan
Ford Taurus
Kia Optima
Chevrolet Impala
Lexus ES
Toyota Avalon
Acura TL
Hyundai Azera
Lincoln MKZ
Buick Lucerne
Volvo S60
Infinit M35
Audi A6
Acura RL
BMW 535i
Cadillac STS
Cadillac DTS
Lexus GS 300
Volvo S80
15
Chernoff Faces
Each variable is assigned to a different feature of the faces.
Class Family Luxury Upscale
Nissan Altima
Honda Accord
Hyundai Sonata
Chevrolet Malibu
Ford Fusion
Mercury Milan
Ford Taurus
Kia Optima
Chevrolet Impala
Lexus ES
Toyota Avalon
Acura TL
Hyundai Azera
Lincoln MKZ
Buick Lucerne
Volvo S60
Infinit M35
Audi A6
Acura RL
BMW 535i
Cadillac STS
Cadillac DTS
Lexus GS 300
Volvo S80
16
Feature Assignments
Unfortunately, some features have a greater impact than others.
17
Radar/Spider Plot
Good for comparing a small number of cases.
Radar/Spider Plot Model Volkswagen Passat Chevrolet Impala Lincoln MKZ Cadillac STS
Reliability (0.0-5.0)
Price (25000.0-55000.0)
18
asked 1000 likely voters: How would your rate the U.S. healthcare system?
19
20
Mosaic Plot
Scales the area of each bar according to the counts in the table.
21
Chi-square Test
Tests for lack of independence between row and column classification.
22
Correspondence Analysis
Used to help visualize the important information in two-way tables.
23
studies.
Distribution fitting is also quite critical in many design
problems.
24
25
Frequency Histogram
Shows the number of observations in non-overlapping intervals.
Histogram
26
frequency
Normal Distribution
The normal distribution is a poor model for this data.
Histogram for Height (X 1000) 4
Distribution Normal
3 frequency
0 -1 2 5 Height 8 11 14
27
Comparison of Distributions
Fits many distributions and sorts them by goodness of fit.
28
Lognormal Distribution
The 3-parameter lognormal distribution is much better.
Histogram for Height
2400 2000 1600 1200 800 400 0 0 2 4 6 Height 8 10 12 Distribution Largest Extreme Value Lognormal (3-Parameter) Normal
29
frequency
30
31
Box-and-Whisker Plots
A very useful plot for comparing samples (from John Tukey).
64 63 62 61
Length
60 59 58 57 56 55 54
1.5 x IQR
Minimum Observation (within 1.5 x IQR below 25th) (>1.5 x IQR below 25th Percentile, or >1.5 x IQR above 75th Percentile)
*
Plus sign may be added to show sample mean.
Outlier
32
33
HSD Intervals
Allow pairwise comparison of all level means.
34
in better models.
35
36
Outlier Plot
Shows each data value with lines at 1, 2, 3 and 4-sigma.
37
Grubbs Test
Small P-value indicates that the extreme Studentized deviate (ESD) is highly unusual.
38
39
40
41
42
Reciprocal X Model
A nonlinear model of the form Y = m/X + b is much better.
Plot of Fitted Model chlorine = 0.368053 + 1.02553/weeks 0.5 0.48 0.46 chlorine 0.44 0.42 0.4 0.38 0 10 20 weeks 30 40 50
43
purposes only.
44
Problem Statement
Optimize 3 response variables: Minimize total fleet acquisition cost (Y1) Maximize climb rate (Y2) Maximize launch rate (Y3)
Input factors:
X1: Fan Pressure Ratio: 3.9 to 4.7 X2: Overall Pressure Ratio : 34 to 40 X3: Inlet airflow : 240 to 270 pps
45
46
Design Plot
Shows the location of the historical data within the factor space.
Experimental Region
270 265 Air Flow 260 255 250 245 240 3.9 4.1 4.3 4.5 FPR 4.7 34 38 37 36 OPR 35 39 40
47
+ -
+ -
+ -
48
Desirability Function
Quantifies the desirability of a joint response (Y1, Y2, Y3). For responses to be minimized:
Optimal Conditions
Found at the levels shown below:
50
Response Surface
Show the estimated desirability throughout the experimental region.
Desirability Plot Air Flow=254.697 Desirability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 41 43
300 280 Air Flow 260 240 220 3.6 3.9 4.2 FPR 4.5 4.8 33 37 39 35 OPR
5.1
31
51
series.
Time series models are used for various purposes:
special models.
52
53
103 Customers
93
83
54
Autocorrelation Function
Estimates the correlation between observations at different lags.
Estimated Autocorrelations for Customers
55
Seasonal Decomposition
Shows the average value during each season (scaled to 100).
Seasonal Index Plot for Customers
56
103 Customers
93
83
73 0 2 4 6 8 Season 10 12 14
57
103 Customers
93
83
Cycle 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 0 3 6 Season 9 12 15
73
58
59
Automatic Forecasting
Fits many models and automatically selects the best.
60
ARIMA Model
Selected model is a rather complicated seasonal ARIMA model.
Time Sequence Plot for Customers ARIMA(2,1,1)x(1,1,2)12 (X 1000) 123 113 Customers 103 actual forecast 95.0% limits
61
events occur, the process by which those events are generated is called a point process.
The most common type of point process is a Poisson process, in
which the times between events are independent and follow a negative exponential distribution.
If the event rate is constant, the process is called homogeneous.
If the event rate changes over time, then the process is called nonhomogeneous.
62
63
Magnitude
12/31/08 Date
12/31/09
12/31/10
64
Earthquake
1/1/08
12/31/10
65
Event Rate
The critical parameter in a point process is the rate of events
per unit time (such as earthquakes per year). This parameter is usually called .
The rate parameter is also related to the mean time between
66
80 Mean 60
Number of events
40
20
0 1/1/08
12/31/10
67
Trend Test
A small P-value would indicate a significant trend.
68
Other Uses
Point process models are also very useful for estimating failure rates.
Events Plot
Aircraft 1 Aircraft 2 Aircraft 3 Aircraft 4 Aircraft 5 Aircraft 6 Aircraft 7 Aircraft 8 Aircraft 9 Aircraft 10 Aircraft 11 Aircraft 12 Aircraft 13 0 500 1000 1500 Time or distance 2000 2500
69
30 Mean 25 Number of events 20 15 10 5 0 0 500 1000 1500 Time or distance 2000 2500 Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft Aircraft 1 2 3 4 5 6 7 8 9 10 11 12 13
70
extremely useful.
Maps are one important example.
71
72
73