Abstract
Similarities and differences among various subspace identification methods (MOESP, N4SID and CVA) are examined by putting them in a general regression framework. Subspace identification methods consist of three steps: estimating the predictable subspace for multiple future steps, then extracting state variables from this subspace and finally fitting the estimated states to a state space model. The major differences among these subspace identification methods lie in the regression or projection methods used in the first step to remove the effect of the future inputs on the future outputs and thereby estimate the predictable subspace, and in the latent variable methods used in the second step to extract estimates of the states. This paper compares the existing methods and proposes some new variations by examining them in a common framework involving linear regression and latent variable estimation. Limitations of the various methods become apparent when examined in this manner. Simulations are included to illustrate the ideas discussed.
focuses on the algorithms instead of concepts and ideas behind of these methods. In this paper, SIMs are compared by casting them into a general statistical regression framework. The fundamental similarities and differences among these SIMs is clearly shown in this statistical framework. All the discussion in this paper is limited to the open loop case of linear time invariant (LTI) system. In next section, a general framework for SIMs will be set up first. Then the following two sections will discuss the major parts and how these methods fit to the framework. A simulation example follows to illustrate the key points. The last section provides conclusions and some application guidelines.
1. Introduction
Subspace identification methods (SIMs) have become quite popular in recent years. The key idea in SIMs is to estimate the state variables or the extended observability matrix directly from the input and output data. The most influential methods are CVA (Canonical Variate Analysis, Larimore, 1990), MOESP (Multivariable Output Error State space, Verhaegen and Dewilde, 1992) and N4SID (Numerical Subspace StateSpace System IDentification, Van Overschee and De Moor, 1994). These methods are so different in their algorithms that it is hard to bring them together and get more insights on the essential ideas and the connections among them. However, some effort has been made to contrast these methods. Viberg (1995) gave an overview of SIMs and classified them into realizationbased or direct types, and also pointed out the different ways to get system matrices via estimated states or extended observability matrix. Van Overschee and De Moor (1995) gave a unifying theorem based on lower order approximation of an oblique projection. Here different methods are viewed as different choices of row and column weighting matrices for the reduced rank oblique projection. The basic structure and idea of their theorem is based on trying to cast these methods into the N4SID algorithm. It
Up =
yp =
/uk= /
\ u~i )

\ Yki
3678
Hk
Uk+l
Yf
equation (6) summarizes the necessary information in the past history to predict the future outputs of (5). By substituting (6) into (5), a linear relationship between the future outputs and the past data as well as the future inputs is obtained:
YS : r'/.(f2p  AS'Fp+Hp )Up + H.rU r + F1iA"F,,+Y,,
For convenience, all the possible __Up for different k are collected in columns of Up which is the past input data set, similar notations for Yp, Uf, Yf, Wp, Vp, Wf and Vf. All the possible Xk for different k are collected in columns of Xk. The relationships between these data sets and the state variables are analyzed in the following multistep statespace representation as a general environment for discussing SIMs and their framework. Based on equation (1), (2) and the above notations, the following multistep statespace model for the current states, the past and future output data can be obtained (Xk_p is the initial state sequence):
X k = APXk_p + ~'~p Up Jrff~s,pmp Yp : FpX~_p + HpUp + H,.pWp + Vp
(3)
(4)
Y: : r ~ X ~ + H~U~ + H,,jWj + V~
(5)
Where extended controllable matrices are .(2p=[ApIB, Ap2B, ..., A B , B], .('~s,p=[A pl, mp2, ..., A, /], the extended observability matrix is Ff=[C T, (CA) T, (CA2) T, ..., (CAf1)T]T and the lower block triangular Toeplitz matrices Hf and Hs,f are : D 0 0 . 0
CB Hf = CAB
+ r / ( ~ , . , A"F,,+H,,,,)Wp FrA"F,,+V, + H,.jW s + ~+ (7) All the terms involving the past form the basis for FrXk, ItfUf is the effect of the future inputs and can be removed if Hf is known or estimated, and the future noise terms are unpredictable. Only FfXk is predictable from the past data set, and this predictable subspace is the fundamental base for SIMs to estimate state sequence Xk or the observability matrix Ff. With autocorrelated inputs, H f U f is correlated with the past data and therefore part of it can be calculated from the past data if the input autocorrelation remains unchanged. However, it is not part of the causality effect to be modeled in system identification, and therefore should not be taken into account for the prediction of Yf based on the past data. The input autocorrelation may give difficulty in estimation of the predictable subspace and the state variables.
D CB
"
0 D
"
... ...
"'
0
o
0
~,CAi2 B N C H,,/ =
CA
CA13 B 0 N
C
CAr4 B 0 0
N
... 0 0
o
0
"
"
"
CA/~2
CAr3
CA
!4
Ff and Hf show the effect of current states and the future inputs on the future outputs respectively The result of substituting Xk_p from (4) to (3) is (Fp+ is the pseudoinverse): x~ : A~rp+rp + (n~  APrp+Hp)Up + (n.,. p  APrp+H.,,p)W~  A'rp+Vp
(6)
That is, current state sequence Xk (therefore FrXk) is a linear combination of the past data. FfXk is the free evolution of current outputs (with no future inputs) and independent of the system matrices. It is the part of future output space in (5) that can be estimated from the data relationships. System states can be defined as "the minimum amount of information about the past history of a system which is required to predict the future motion" (Astr/Dm, 1970). Here the linear combination of the terms on the right hand side of
3679
iii) Then fit the estimated states to a state space model. The major differences among SIMs are in the first two steps and the third step is the same. The original MOSEP algorithm extracts Ff from the estimated subspace. Here MOESP is analyzed based on estimated states that come from exactly the same subspace as Ff (also refer to Van Overschee and De Moor, 1995).
method to estimate Hf implicitly and the predictable subspace via QR decomposition on [Uf; Yf]. If the input sequences are autocorrelated, this method regresses part of the state effect away and gives a biased result for the predictable subspace. SVD on this subspace will gives an asymptotical unbiased estimation of ~ ; however, the estimation Of Xk will be biased. 2. Regression Yf against [Yp; Up; Uf] (N4SID) Based on (6), we know FfXk in (5) can be estimated by a linear combination of the past inputs Up and past outputs Yp. It is a natural choice to regress Yf against [Yp; Up; Uf]. Here the regression coefficient for Uf is an estimate of Hf (/2(r) and the part corresponding to the past data is an estimation of the predictable subspace, which is equivalent to projection Y~/2/.1Uf onto the past data. This estimation will have a slight bias if the input signals are autocorrelated. This bias will occur because of the correlation between past outputs and the past noise terms in (7). This is the method used in N4SID to estimate Hf and the predictable subspace. It is realized by QR decomposition of [Uf; Pio; Yf] (Pio=[Yp; Up]). The POMOESP (1994, past output (PO) MOESP) gives similar results. 3.Constructing Hf from impulse weights of ARX model
(CVA)
The nature of Hf implies that it can be constructed by the first f impulse block weights. These impulse weight blocks can be estimated from a simple model, such as an ARX model or FIR model, which can be obtained by regressing Yk against Uk (if De0), past inputs (Up) and past outputs (Yp). The predictable subspace then is estimated as Yr ~ Uf. It includes all the future noise. This is the method some CVA algorithms use to estimated Hf and the predictable subspace. 4. Regression out method Uf can be regressed out of both sides of (7) by projecting to the orthogonal space of Uf, i.e., by post multiplying both side by Pufo=IUfr(ufufr)lUf. This removes away the Uf term from the equation, and the coefficient matrices for past data in (7) can be obtained by regressing YfPufo against PioPufo. This result is equivalent to that from N4SID, and was implied in Van Overschee and De Moor (1995). The method has also been applied to the CVA method (refer to Van Overschee and De Moor, 1995; Carette, 2000). See next section for more discussion. Another similar approach is to regress past data p out of both sides of (7) (projecting to the orthogonal space of Pio, post multiplied by Ppo = IP, oT(PIoPIoT)'PIo) for the estimation of Hr. This turns out to be equivalent to the approach of N4SID. 5.Instrumental Variable Method. If there is a variable that is correlated to Uf but has no
3680
correlation with Xk and the future noise, an unbiased He can be estimated by the instrumental variable (IV) method based on (5). For autocorrelated inputs, Ue correlates with Xk through its correlation with Up, therefore the part of Ue, which has no correlation with past data, has no correlation with Xk. This part of Ue can be constructed by regressing Up out of Ue, and take the residual as the IV. Once He is estimated, the predictable subspace can be easily estimated. All these estimation methods are variants of the linear regression method: they differ only in their choice of the independent and dependent variables, of regression sequences, and the degree of utilization of knowledge about the features of He. The key problem comes from the correlation between Ue and Xk, which arises from autocorrelation of the input sequence in the open loop case. The estimation accuracy (bias and variance) in each method depends on the input signal, the true model structure, and the signal to noise ratio (SNR).
3681
proven to give the unbiased state variable estimates (for proof, see Shi, 2001). However, since part of the state signal is removed away by regressing Uf out while the noise is kept intact, the data set YfPurohas a worse SNR than Yf~/.If in general.
3. Other Possible Methods
Other LVMs, such as PLS and RRA, are possible choices for state extraction. For example, RRA should provide estimates of the states based on the same objective as N4SID, that is, states which maximizing the variance explained in the predictable subspace. However, it should give numerically improved estimates since it directly obtains the n LVs which explain the greatest variance in the predictable subspace, rather than the twostep procedure of the first performing an illconditioned least squares (oblique projection) followed by PCA/SVD (see Shi, 2001). Since the objective of PLS is to model the variance in the past data set and the predictable subspace as well as their correlation, it will generally not provide minimal order state models. In effect, it will try to provide state vectors for the joint input/output space (refer to Shi and MacGregor, 2000). Combinations of the methods used for the predictable subspace estimation and the methods used for the state variable estimation can lead to a whole set of different subspace identification methods.
State estimation by PLS gives relatively degraded results compared with CCA or RRA based on the same estimated predictable subspace by ARX. The estimated states by other methods are much closer to the true states. Similar conclusions are indicated by the squared multiple correlation R 2, which shows how much the total sum squares of the true states can be explained by the estimated states (two states are scaled to unit variance). CCA results based on PioPufo and YfPufo and those based on Pio and YfHfUf (using the true He) are also compared. The coefficient matrix J1 from the former is different from that of the latter (to large to show). The estimated states from the former method are not linear combinations of those estimated by the latter method, but they are very close. If the estimated states are used for fitting the state space model, each method can be compared by plotting their estimated impulse responses (Fig.l) and their errors (Fig.2). MOESP gives is a poor result for this example. The result from SIMARXPLS has a large error but can be improved to match the others by using more LVs (refer to Shi and MacGregor, 2000). The results from other SIMs are very close to the true values. The somewhat irregular response from the ARX model is also shown for comparison. All SIMs give smooth response by fitting the LVs to the state equation.
References
AstrOm, K.J., Introduction to stochastic Control Theory, Academic Press, 1970 Burnham, A.J.R. Viveros and J.F. MacGregor, Frameworks for Latent Variable Multivariate Regression, Journal of Chemometrics, V 10, pp.3145, 1996 Carette, P., personal communication, notes for CVA, 2000 Larimore, W. E., Canonical Variate Analysis for system Identification, Filtering and Adaptive Control, Proc. 29th IEEE conference on Decision and Control, Vol.1, Honolulu, Hawaii, 1990 Larimore, W. E., Optimal Reduced Rank Modeling, Prediction, Monitoring and Control Using Canonical Variate Analysis, preprints of ADCHEM, Banff, 1997
3682
Ljung, L. And McKelvey, T., Subspace identification from closed loop data, Signal Processing, V52, 1996 Shi, R. and J. MacGregor, Modeling of Dynamic Systems using Latent Variable and Subspace Methods, J. of Chemometrics, V14, pp.423439, 2000 Shi, R., Ph.D thesis, McMaster University, ON, Canada, 2001 Van Overschee, P. and De Moor, B., N4SID" Subspace Algorithms
Irrloulse ~ R
0.3
. . . .
0.25
True MOESP
0.2i
for the Identification of Combined DeterministicStochastic System, Automatica, Vol. 30, No. 1, pp. 7593, 1994 Van Overschee, P. and De Moor, B., A Unifying Theorem for Three Subspace System, Identification Algorithms,
Automatica, Vol.31. No. 12, pp. 18351864, 1995 Verhaegen, M. and Dewilde P., Subspace Model Identification,
/!~i~ o~ i"~t',,
o.
05 il
............ N4SID
OVA
.........
Part 1., The Outputerror Statespace model identification class of algorithms, International Journal of Control, V56,
pp.11871210, 1992 Verhaegen, M., Identification of Deterministic Part of MIMO State
,:?.:,.,
, "~'"~'~,..~.,, ...
',,),.,
",... , .........  ..... . .....
' 5
1'o
1; ~
. . 20 . 25 . 30 . 35 (Sa~ing Irter~t)
40
45
50
Table 1 the Impulse Weights in Estimated Hf Method True Yf/Uf Y([PIo;Ufl ARX Wl w2 w3 w4 w5 w6 w7 Sum Abs. Err. 0 0.2 0.16 0.128 0.1024 0.0819 0.0655 0 0.1003 0.2716 0.2203 0.1770 0.1626 0.1613 0.2135 0.5688 0.0159 0.1951 0.1585 0.1035 0.0728 0.0668 0.0510 0.1061 0 0.1997 0.1546 0.1086 0.0863 0.0602 0.0610 0.0675
~,.~
/ ~.!:o'
I!:~
":,,*/"
rr
~ E 0.015 0.01
G2;..!,"~ i .....
i\.,,,i!
: i
i,,i
0.(32
Table 2 Comparison between the Estimated and the True States Method MOESP N4SID CVA SIMSIM SIMIVARX ARXCCA PLS RRA IV ARX ARX Predictable Yf/Uf Yf/[PIo;Uf] ARX Subspace CCA PLS RRA Estimated PCA PCA CCA States 1st CC 0.8680 0.9993 0.9997 0.9995 0.9500 0.9993 2 nd CC 0.2599 0.9623 0.9613 0.9600 0.9122 0.9618 R2 0.4031 0.9612 0.9605 0.9590 0.8667 0.9606
0.(125
0
10
15
20
25
3o
35
4o
45
]irr~(SaTl:lirgIrto~i)
Fig. 2 Error on the impulse responses
3683