Understanding
Operating Data
Tony Cooper Operating data offer valuable clues and are
Tony Cooper, LLC
essential to managing and improving processes.
However, they can also be misleading and
hide the underlying issues.
The principles and statistical tools presented here
can help you make effective use of process data.
U
nderstanding the performance of production pro- inputs and process parameters influencing y are xs. Although
cesses is fundamentally an engineering issue, and each y is a synergistic function of multiple xs, statistics are
operating data extracted from the process can be often incorrectly used to empirically build a relation-
a powerful stimulus. Statistical skills allow the engineer to ship between one y and one x.
create useful information out of these data. By applying the Since the data used to evaluate the operation of chemical
concepts discussed in this article, these skills can be devel- processes are typically kept in various locations, the first step
oped and put to good use. in data analysis is often to compile all the accumulated data
on both ys and xs. For instance, analytical data on customer-
Consider variation critical ys may be kept in laboratory reports, while x data,
Variation in output is due to an inability to control nui- such as information on sensors, switches, valves, flowrates,
sance variables and to hold setpoints. The measurement sys- etc., are kept in historical process records. Other important
tem is sometimes incapable of detecting the variation, while data, which may be kept in other locations, include:
at other times the variation is so large that it dominates. raw material supplier data, costs, and order dates
Variation can lead to a misunderstanding of the actual downtimes and maintenance records
reaction stoichiometry, actual mass balance, or actual operator assignments
energy consumption. In such circumstances, a determin- environmental conditions
istic model provides a poor framework for understanding customer feedback.
and managing chemical operations. This complicates the
integration of a theoretical understanding with an empirical Investigate the integrity of the data
view of the process. The lack of integrity in a set of data can provide valuable
Because of variation, analysis of data involves not just a clues, but can also be misleading. Three common data-
single point, but rather a distribution of points. If the process integrity issues are missing records, incomplete records, and
is stable, the sampled data will follow some type of statis- measurement error.
tical distribution. This is often the normal (or Gaussian) Missing data can take various forms. For instance,
distribution depicted in Figure 1. Regardless of the assumed records from certain time periods or for certain batches may
distribution, statistics are used to estimate parameters of the be missing. There is a temptation to analyze the remaining
underlying distribution. data and ignore the information that is not there, assuming
someone inadvertently forgot to enter it, a sample was lost,
Gather all the available data or a computer crashed. The information, however, could
Keep in mind the relationship y = f(x), where both y have been lost as a result of a significant event. The fact that
and x are vectors or matrices. A quality product must meet data are missing is often a serious clue.
multiple criteria, such as purity and homogeneity. Numerous A subtler form of missing data involves incomplete
are a function of the empirical data, they are said to report uniform temperature, or by adjusting the amount of water
the voice of the process. Requirements and specifications, added to a waste stream to keep its concentration within
which represent the voice of the customer, do not factor acceptable limits.
into the control limit calculations. Control charts assist in debugging the variation in the
Statistical control charts have little in common with process. They do not control the variation, but rather report
process controllers (e.g., feedback controllers). The purpose on whether the variation is likely to be either the result of an
of process controllers is to move variability from the product assignable event or consistent with the process itself and its
stream to more benign locations to control the effects of many common, but unassignable, causes. Failure to under-
variation. For example, product quality might be maintained stand the cause of the variation can lead to changes that
by adjusting the flowrate of steam through a jacket to ensure make the situation worse.
Control charts are a departure from typi-
1 2 3 4 5 18 cal quality-control inspections. Inspection is
intended to find bad products before they are
shipped, while control charts focus on the causes
of bad product in particular, where and when
the variation in the process occurs and whether
it is consistent or inconsistent. A control chart
95.11
88.78
92.02
91.53
92.16
93.48
92.46
90.28
90.42
92.37
93.48
93.45
90.30
90.73
93.35
91.87
90.64
91.48
92
91 with control limits, and mean (or Xbar) and range charts,
90 which consider subgrouping.
89
88 Specification
Off-Spec Product As an example, consider the sampling and subgroup-
87 ing (within and between batches) represented in Figure 3,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
where three samples were taken from random locations
S Figure 4. A frequency plot shows that three of the batches sampled in each of 18 batches. The data are plotted in Figure 4,
according to the plan given in Figure 3 contained off-spec product. which shows that three samples contain off-spec product.
These batches need to be quarantined or
reworked, but this conventional process
1 2 3 4 5 18
management work does not improve the
process. The control charts in Figure 5
show that one sample per batch would
not have revealed the true issue of con-
siderable variation within each batch.
Every sample from every batch had the
95.11
88.78
92.02
91.53
92.16
93.48
92.46
90.28
90.42
92.37
93.48
93.45
90.30
90.73
93.35
91.87
90.64
91.48
8
UCL = 7.45 95 UCL = 95.00
6 the comparison of the sample data to
Avg = 92.04
4
Avg = 2.89
specifications (Figure 4) shows that
2 90 LCL = 89.08 the process needs improvement. The
0 LCL = 0.00
control charts indicate how to improve
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Batch Batch the process.
To make better product, the engineer
S Figure 5. Range and mean control charts revel that despite variation within batches, the process needs to consider the ubiquitous varia-
is considered to be in control.
tion sources that create variation in every
Component D
expensive option. A more elegant way 0.7 0.8
y
0.6
to make the batch more homogenous is 0.5 0.7
to address the charge pattern and rate. 0.4 0.6
0.3
0.2 0.5
Basic summary statistics 0 5 10 15 20 25 30 35 40 0.4
Statistics are useful for summariz- Time 0.3
ing historical events such as the aver- Component C Component D 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
age quality of product shipped over Component C
a specific time period, but they often
lack context. Time series are lost, y = f(x) relationships are Correlation Coefficient =
/^ x x h^ y y h = 0.71
avg avg
30
Fundamental multivariate statistics
20 and statistical methods
10 When both x and y are vectors obtained by taking
samples repeatedly over time, multivariate methods
0
are needed to analyze the data. The simplest multi-
0 1,000 2,000 3,000 4,000
variate methods are interaction plots, disaggregation,
Minute
and dimensionality-reducing statistics derived from
S Figure 8. A plot of a principal component over time can be used to hone in on the principal component analysis (PCA).
most important information.
Tool 4: Multivariate analysis
330 When hundreds of variables are being recorded,
it is virtually impossible to monitor or evaluate
Rate
300
270
each variable individually. But if two variables are
240
perfectly correlated, it is not necessary to report
40
both, because knowledge of one provides infor-
Power
35
mation about the other. Multivariate methods use
30 mathematical techniques to identify redundant or
25 partially redundant information, reducing the number
0 50 100 150 200 250
of variables and highlighting the most important
data. Multivariate methods are used for two primary
Rows
reasons:
S Figure 9. Separate plots show variation in rate and power over time to summarize information contained in a
T Figure 10. which is clarified by plotting the second principal component over time. large number of variables
to consider the ratio between variables.
Many of the above-mentioned techniques
0 are incorporated into spreadsheet programs, but
2nd Principal
Component