Abstract
The analysis of historical process data of technological systems plays important role in process monitoring, modelling and control. Time-
series segmentation algorithms are often used to detect homogenous periods of operation-based on input–output process data. However,
historical process data alone may not be sufficient for the monitoring of complex processes. This paper incorporates the first-principle model
of the process into the segmentation algorithm. The key idea is to use a model-based non-linear state-estimation algorithm to detect the
changes in the correlation among the state-variables. The homogeneity of the time-series segments is measured using a PCA similarity factor
calculated from the covariance matrices given by the state-estimation algorithm. The whole approach is applied to the monitoring of an
industrial high-density polyethylene plant.
© 2005 Elsevier Ltd. All rights reserved.
0098-1354/$ – see front matter © 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compchemeng.2005.02.014
2 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx
the variables, vary in time. In case of process engineering utive time points, S(a, b) = {a ≤ k ≤ b}, xa , xa+1 , . . . , xb .
systems this phenomena can occur when a different product The c-segmentation of time-series T is a partition of T to c
is formed, and/or different catalyst is applied, or there are non-overlapping segments, STc = {Si (ai , bi )|1 ≤ i ≤ c}, such
significant process faults, etc. The segmentation of only one that a1 = 1, bc = N and ai = bi−1 + 1. In other words, a c-
measured variable is not able to detect such changes. Hence, segmentation splits T to c disjoint time intervals by segment
the segmentation algorithm should be based on multivariate boundaries s1 < s2 < . . . < sc , where Si (si−1 + 1, si ).
statistical tools. Usually the goal is to find homogeneous segments from
Hence, the aim of this paper is to develop new algorithms a given time-series. In order to formalize this goal, a cost
that are able to handle time-varying multivariate data that function with the internal homogeneity of individual seg-
is able to detect changes in the correlation structure among ments should be defined. This cost function can be any arbi-
variables. trary function. For example in (Himberg, Korpiaho, Mannila,
The segmentation algorithms simultaneously determine Tikanmaki, & Toivonen, 2001; Vasko & Toivonen, 2002) the
the parameters of the models and the borders of the segments sum of variances of the variables in the segment was defined
by minimizing the sum of the costs of the individual segments. as cost(Si (ai , bi )):
Hence, a cost function describing the internal homogeneity
bi
of individual segments should be defined. Usually, this cost 1
cost(Si (ai , bi )) = xk − vi 2 . (1)
function is based on the distances between the actual values bi − a i + 1
k=ai
of the time-series and the values given by a simple func-
tion fitted to the data of each segment (Keogh, Chu, Hart, & where vi the mean of the segment.
Pazzani, 2001). Hence, time-series segmentation algorithms, Usually, the cost function, cost(S(a, b)), is defined based
such as methods that applies Principal Component Analy- on the distances between the actual values of the time-series
sis (PCA) and fuzzy clustering algorithm (Nemeth, Abonyi, and the values given by a simple function (constant or linear
Feil, & Arva, 2003) are based on input–output process data. function, or a polynomial of a higher but limited degree)
However, historical process data alone usually may not fitted to the data of each segment. Hence, the segmentation
sufficient for monitoring complex processes. The current algorithms simultaneously determine the parameters of the
measured input–output data pairs are often not in casuality models and the borders of the segments, ai , bi , by minimizing
relationship because of the dead time and the dynamical be- the sum of the costs of the individual segments:
havior of the system. In practice, the state-variables happen c
to be not measurable, or rarely measured only by off-line lab- cost(STc ) = cost(Si ). (2)
oratory tests. To solve these problems, different methods can i=1
be applied that happen to force the usage of delayed mea-
sured data besides the current data, e.g. the method proposed This cost function can be minimized by dynamic program-
in Srinivasan, Wang, Ho, and Lim (2004) which is based on ming, which is computationally intractable for many real
Dynamic Principal Component Analysis. datasets (Himberg et al., 2001). Consequently, heuristic op-
The main idea of this paper is to apply non-linear state- timization techniques such as greedy top-down or bottom-up
estimation algorithm to detect changes in the estimated state- techniques are frequently used to find good but suboptimal c-
variables and the correlation of their modelling error. segmentations (Keogh et al., 2001; Stephanopoulos and Han,
This paper is organized as follows. In Section 2.1, the basic 1996):
idea of time-series segmentation and the applied algorithm
are given. Section 2.2 gives overview of multivariate seg- Sliding window: A segment is grown until it exceeds some
mentation and the measure of internal homogeneity. Section error bound. The process repeats with the next data
2.3 proposes three different methods to get information about point not included in the newly approximated segment.
the changes of multivariate time-series. These approaches are For example a linear model is fitted on the observed
compared in a case study based on a real-life application ex- period and the modelling error is analyzed.
ample in Section 3. Finally some conclusions are given in Top-down method: The time-series is recursively partitioned
Section 4. until some stopping criterion is met.
Bottom-up method: Starting from the finest possible approx-
imation, segments are merged until some stopping cri-
2. State-estimation-based segmentation of historical terion is met.
process data Search for inflection points: Searching for primitive
episodes located between two inflection points.
2.1. Time-series segmentation
Among these heuristic approaches the bottom-up algorithm
A time-series, T = {xk |1 ≤ k ≤ N}, is a finite set of has been proven to be practically useful. This algorithm be-
N samples labelled by time points t1 , . . . , tN , where xk = gins creating a fine approximation of the time-series, and
[x1,k , x2,k , . . . , xn,k ]T . A segment of T is a set of consec- goes on to merge the lowest cost pair of segments iteratively
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx 3
Table 1 Because the Ui,p and Uj,p subspaces contain the p most im-
Bottom-up segmentation algorithm portant principal components that account for most of the
Create initial fine approximation. variance of the state-variables at the ith and jth time instants,
Find the cost of merging for each pair of segments: scov is also a measure of the similarity between the two co-
mergecost(i) = cost(S(ai , bi+1 ))
while min(mergecost) < maxerror
variance matrices.
Find the cheapest pair to merge: i = argmini (cost(i)) The similarity of the found segments can be displayed as
Merge the two segments, update the boundary indices, ai , bi , and a dendrogram. A dendrogram is a tree-shaped map of the
recalculate the merge costs. similarities that shows the merging of segments into clus-
mergecost(i) = cost(S(ai , bi+1 )) ters at various stages of the analysis. The interpretation of
mergecost(i − 1) = cost(S(ai−1 , bi ))
end
the results is intuitive, which is the major reason of these
methods to illustrate the results of a hierarchical clustering
(see Fig. 5).
until a stopping criteria is met. The detailed description of
the algorithm can be found in Table 1.
2.3. Covariance of the monitored variables
2.2. Covariance-based similarity measure
In the previous subsection, it has been shown that the co-
Time-series segmentation is often used to extract inter- variance of the monitored process variables can be used to
nally homogeneous segments from a given time-series. Usu- measure the homogeneity of the segments of multivariate
ally, the cost function describing the internal homogeneity time-series. The main problem of the application of this ap-
of the individual segments is defined based on the distances proach is how we can estimate covariance matrices that con-
between the actual values of the time-series and the values tain useful information about the operation of the monitored
given by a simple univariate function fitted to the data of each process.
segment. The most straightforward approach is the recursive esti-
Due to the hidden nature of the process the measured vari- mation of the Pk covariances:
ables are correlated. In some cases the hidden process, so
the correlation among the variables, vary in time. This phe- 1 Pk−1 x̃k x̃kT Pk−1
Pk = Pk−1 − (5)
nomena can occur at process transitions or when there is a αj,k αj,k + x̃kT Pk−1 x̃k
significant process fault, etc. The segmentation of only one
where Pk is a matrix proportional to the covariance matrix
measured variable is not able to detect such changes. Hence,
and αj is a scalar forgetting factor of the jth rule adaptation.
the segmentation algorithm should be based on multivariate
This tool can be directly used to analyze the measured
statistical tools.
input–output data, x̃k = [uT , y]T , which approach is consid-
Covariance matrices, Pk , describe the relationship be-
ered as the basis of the first algorithm proposed in the paper
tween the variables around the kth data point and they can
(Algorithm 1).
also be used to calculate the cost function-based on a covari-
Historical input–output process data alone may be not suf-
ance matrix similarity measure:
ficient for the monitoring of complex processes. Hence, the
bi
main idea of this paper is to apply non-linear state-estimation
1
cost(Si (ai , bi )) = scov (Pk , PSi ) (3) algorithm to detect changes in the in the estimated state-
bi − a i + 1
k=ai variables (Algorithm 2) and the correlation of their mod-
where PSi is the covariance matrix of the ith segment with the elling error (Algorithm 3).
borders ai and bi , which can be calculated by the averaging The proposed algorithms have been developed for the gen-
of the matrices Pk |ai ≤ k ≤ bi . eral non-linear model of a dynamical system:
To compare covariance matrices, a PCA similarity factor,
xk+1 = f(xk , uk , vk ) (6)
scov , developed by Krzanowski (1979) can be applied. Let us
consider the first p eigenvectors of the Pi and Pj covariance yk = g(xk , wk ) (7)
matrices, Ui,p and Uj,p , which can be considered the (n × p)
subspaces of two PCA models. The similarity between these where vk and wk are noise variables assumed to be in-
subspaces is defined based on the sum of the squares of the dependent of the current and past states, vk ∼ N(v̄k , Qk ),
cosines of the angles between each principal component of wk ∼ N(w̄k , Rk ).
Ui,p and Uj,p : The developed algorithm is based on the results of standard
p p
state-estimation algorithms, i.e. the estimated state-variables,
1
scov (Pi , Pj ) = cos2 i,j x̂k = x̄k + Kk [yk − ȳk ] (8)
p
i=1 j=1
and their a posteriori covariance matrix,
1
= trace(Ui,p
T T
Uj,p Uj,p Ui,p ) (4)
p P̂k = E[(xk − x̂k )(xk − x̂k )T ] (9)
4 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx
Fig. 1. Screeplot for determining the proper number of principal components in case of datasets presented in (a) Section 3.4 and (b) Section 3.5, respectively.
6 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx
Fig. 2. Determining the number of segments by Algorithm 3 in case of datasets presented in (a) Section 3.4 and (b) Section 3.5, respectively.
As it can be seen in Fig. 2, significant reductions are For this purpose, Algorithm 3 was chosen from the meth-
not achieved by using more than five or six segments in ods presented above, because it gives good results in case
case of both datasets. Similar figures can be obtained by of product changes. One of these results can be seen in Fig.
Algorithm 2. 4, which shows a 120-h long production period without any
product changes. Based on the relative reduction of error in
3.4. Monitoring of process transitions Fig. 2(b), the number of segments was chosen to be equal to
six (c = 6).
In this study, a set of historical process data covered 100 h The homogeneity of a historical process data set can be
period of operation has been analyzed. These datasets include characterized by the similarity of the segments that can be
at least three segments because of a product transition around illustrated as a dendrogram (see Fig. 5).
the 45th hour (see Fig. 3). Based on the relative reduction of This dendrogram and the border of the segments give a
error in Fig. 2(a), the algorithm searched for five segments chance to analyze and to understand the hidden processes
(c = 5). of complex systems. In this example, these results confirm
The results depicted in Fig. 3 show that the most reason- that the quality of the catalyst has an important influence
able segmentation has been obtained based on the covariance in productivity. During the 20, 47, 75, 90th hours of the
matrices of state-estimation algorithm (Algorithm 3). The presented period of operation changes between the catalyst
segmentation obtained based on the estimated state-variables feeder bins happened. The segmentation algorithm-based on
is similar: the boundaries of the segment that contains the the estimated state-variables was able to detect these changes
transition around the 45th hour are nearly the same, and the that had an effect to the catalysis productivity, but when only
other segments contain parts of the analyzed dataset with the input–output variables were used segments without any
similar properties. Contrary to these nice results, when only useful information were detected.
the measured input–output data were used for the segmen- It has to be noted that the borders of the segments given
tation the algorithm was not able to detect even the process by Algorithms 2 and 3 are similar also in this case, but
transition. the dendrograms are different. This is because that the seg-
It has to be noted that Algorithm 3 can be found more ments without product transition are much more similar to
reasonable than Algorithm 2, because one additional param- each other than in case of the time-series which contains
eter has to be chosen in the last case: the forgetting factor, a product transition. So it is a more difficult problem to
α in the recursive estimation of the covariance matrices in differentiate segments of operations related to the minor
(5). The result obtained by Algorithm 2 is very sensitive to changes of the technology, like the changes of the catalyst
its choice. The α = 0.95 is seemed to be a good trade-off productivity. This phenomena can also be seen in the den-
between robustness and flexibility. drogram: the values that belong to the axis of ordinates are
smaller with one or two order(s) of magnitude in case of
3.5. Detection of changes in the catalyst productivity a time-series without product transition. In case of product
transition not only the borders of the segments are similar
Beside the analysis of the process transitions, the time- but also the shape of the dendrograms are nearly the same.
series of “stable” operations have also been segmented to This shows that both algorithms are applicable for similar
detect interesting patterns of relatively homogeneous data. purposes.
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx 7
Fig. 3. (a and b) Segmentation-based Algorithm 1; (c and d) segmentation-based on Algorithm 2; (e and f) segmentation-based on Algorithm 3; (a, c, and e)
input variables: FCin2 , FCin4 , FCin6 , FHin2 , Fcat
in , T in , T out ; (b, d and f) process outputs and states: T , c , c , c , ρ
w w R C2 C4 C6 slurry , kC2 , kC6 , kH2 .
8 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx
References
Himberg, J., Korpiaho, K., Mannila, H., Tikanmaki, J., & Toivonen, H. T.
(2001). Time-series segmentation for context recognition in mobile de-
vices. IEEE international conference on data mining (ICDM’01, San
Jose, California), pp. 203–210.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for
segmenting time series. IEEE International Conference on Data Mining;
http://www.citeseer.nj.nec.com/keogh01online.html.
Kivikunnas, S. (1998) Overview of process trend analysis methods and appli-
cations. ERUDIT workshop on applications in pulp and paper industry,
page CD ROM.
Krzanowski, W. J. (1979). Between-groups comparison of principal
Fig. 5. Similarity of the found segments.
components. Journal of the American Statistical Society, 74, 703–
707.
4. Conclusions Last, M., Klein, Y., & Kandel, A. (2000). Knowledge discovery in time series
databases. IEEE Transactions on Systems, Man, and Cybernetics, 31 (1),
160–169.
This paper presented the synergistic combination of state- Nemeth, S., Abonyi, J., Feil, B., & Arva, P. (2003). Fuzzy clustering
estimation and advanced statistical tools for the analysis of based segmentation of time-series. Lecture Notes in Computer Science,
multivariate historical process data. The key idea of the pre- 2810/2003, 275–285.
sented segmentation algorithm is to detect changes in the Poulsen, N. K., Norgaard, M., & Ravn, O. (2000). New developments
correlation among the state-variables-based on their a pos- in state estimation for nonlinear systems. Automatica, 36 (11), 1627–
1638.
teriori covariance matrices estimated by a state-estimation Srinivasan, R., Wang, C., Ho, W. K., & Lim, K. W. (2004). Dynamic principal
algorithm. The PCA similarity factor can be used to ana- component analysis based methodology for clustering process states in
lyze these covariance matrices. Although the developed al- agile chemical plants. Industrial & Engineering Chemistry Research, 43,
gorithm can be applied to any state-estimation algorithms, 2123–2139.
the performance of the filter has huge effect on the segmen- Stephanopoulos, G., & Han, C. (1996). Intelligent systems in process en-
gineering: A review. Computational Chemical Engineering, 20, 743–
tation. The applied DD2 filter has been proven to be accu- 791.
rate, and it was straightforward to include a varying num- Vasko, K., & Toivonen, H. T. T. (2002). Estimating the number of segments in
ber of parameters in the state-vector for simultaneous state time series data using permutation tests. IEEE International Conference
and parameter estimation, which was really useful for the on Data Mining, 466–473.
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxx–xxx 9
Vincze, Cs., Arva, P., Abonyi, J., & Nemeth, S. (2003). Process analysis and Yamashita, Y. (2000). Supervised learning for the analysis of the pro-
product quality estimation by self-organizing maps with an application cess operational data. Computers and Chemical Engineering, 24, 471–
to polyethylene production. Computers in Industry, Special Issue on Soft 474.
Computing in Industrial Applications, 52 (3), 221–234. Zhang, J., Martin, E. B., & Morris, A. J. (1997). Process monitoring us-
Wang, X. Z. (1999). Data mining and knowledge discovery for process mon- ing non-linear statistical techniques. Chemical Engineering Journal, 67,
itoring and control. Springer. 181–189.