Anda di halaman 1dari 6

Proceedings of the American Control Conference Arlington, VA June 25-27, 2001

A Framework for Subspace Identification Methods

R u i j i e Shi a n d J o h n F. M a c G r e g o r Dept. of Chemical Engineering, McMaster University, Hamilton, ON L8S 4L7, Canada Email:,

Similarities and differences among various subspace identification methods (MOESP, N4SID and CVA) are examined by putting them in a general regression framework. Subspace identification methods consist of three steps: estimating the predictable subspace for multiple future steps, then extracting state variables from this subspace and finally fitting the estimated states to a state space model. The major differences among these subspace identification methods lie in the regression or projection methods used in the first step to remove the effect of the future inputs on the future outputs and thereby estimate the predictable subspace, and in the latent variable methods used in the second step to extract estimates of the states. This paper compares the existing methods and proposes some new variations by examining them in a common framework involving linear regression and latent variable estimation. Limitations of the various methods become apparent when examined in this manner. Simulations are included to illustrate the ideas discussed.

focuses on the algorithms instead of concepts and ideas behind of these methods. In this paper, SIMs are compared by casting them into a general statistical regression framework. The fundamental similarities and differences among these SIMs is clearly shown in this statistical framework. All the discussion in this paper is limited to the open loop case of linear time invariant (LTI) system. In next section, a general framework for SIMs will be set up first. Then the following two sections will discuss the major parts and how these methods fit to the framework. A simulation example follows to illustrate the key points. The last section provides conclusions and some application guidelines.

2. General Statistical F r a m e w o r k for SIMs

2.1 Data Relationships in Multi-step Statespace Representation
A linear deterministic-stochastic combined system can be represented in the following state space form: xk+1 = Ax~ + Bu~ + w~ (1)
Yk = Cx~ + D u k + Nw~ + v~ (2) where outputs Yk, inputs Uk and state variables Xk are of dimension l, m and n respectively, and stochastic variable Wk and Vk are of proper dimensions and un-correlated with each other. In order to catch the dynamics, SIMs use the multiple steps of past data to relate to the multiple steps of future data. For an arbitrary time point k taken as the current time point, all the past p steps of the input forms a vector Up, and the current and the future jl steps of the input forms a vector u_r. Similar symbols for output and noise variables (some algorithms assume p-j):

1. Introduction
Subspace identification methods (SIMs) have become quite popular in recent years. The key idea in SIMs is to estimate the state variables or the extended observability matrix directly from the input and output data. The most influential methods are CVA (Canonical Variate Analysis, Larimore, 1990), MOESP (Multivariable Output Error State space, Verhaegen and Dewilde, 1992) and N4SID (Numerical Subspace State-Space System IDentification, Van Overschee and De Moor, 1994). These methods are so different in their algorithms that it is hard to bring them together and get more insights on the essential ideas and the connections among them. However, some effort has been made to contrast these methods. Viberg (1995) gave an overview of SIMs and classified them into realization-based or direct types, and also pointed out the different ways to get system matrices via estimated states or extended observability matrix. Van Overschee and De Moor (1995) gave a unifying theorem based on lower order approximation of an oblique projection. Here different methods are viewed as different choices of row and column weighting matrices for the reduced rank oblique projection. The basic structure and idea of their theorem is based on trying to cast these methods into the N4SID algorithm. It

Up =

yp =

/uk-= /
\ u~-i )

\ Yk-i

0-7803-6495-3/01/$10.00 2001 AACC




Uf= / Uk+2. k.Uk+t'-I


equation (6) summarizes the necessary information in the past history to predict the future outputs of (5). By substituting (6) into (5), a linear relationship between the future outputs and the past data as well as the future inputs is obtained:
YS : r'/.(f2p - AS'Fp+Hp )Up + H.rU r + F1iA"F,,+Y,,

For convenience, all the possible __Up for different k are collected in columns of Up which is the past input data set, similar notations for Yp, Uf, Yf, Wp, Vp, Wf and Vf. All the possible Xk for different k are collected in columns of Xk. The relationships between these data sets and the state variables are analyzed in the following multi-step statespace representation as a general environment for discussing SIMs and their framework. Based on equation (1), (2) and the above notations, the following multi-step state-space model for the current states, the past and future output data can be obtained (Xk_p is the initial state sequence):
X k = APXk_p + ~'~p Up -Jr-ff~s,pmp Yp : FpX~_p + HpUp + H,.pWp + Vp


Y: : r ~ X ~ + H~U~ + H,,jWj + V~


Where extended controllable matrices are .(2p=[Ap-IB, Ap-2B, ..., A B , B], .('~s,p=[A p-l, mp-2, ..., A, /], the extended observability matrix is Ff=[C T, (CA) T, (CA2) T, ..., (CAf-1)T]T and the lower block triangular Toeplitz matrices Hf and Hs,f are : D 0 0 -.- 0

+ r / ( ~ , . , -A"F,,+H,,,,)Wp -FrA"F,,+V, + H,.jW s + ~+ (7) All the terms involving the past form the basis for FrXk, ItfUf is the effect of the future inputs and can be removed if Hf is known or estimated, and the future noise terms are unpredictable. Only FfXk is predictable from the past data set, and this predictable subspace is the fundamental base for SIMs to estimate state sequence Xk or the observability matrix Ff. With auto-correlated inputs, H f U f is correlated with the past data and therefore part of it can be calculated from the past data if the input auto-correlation remains unchanged. However, it is not part of the causality effect to be modeled in system identification, and therefore should not be taken into account for the prediction of Yf based on the past data. The input auto-correlation may give difficulty in estimation of the predictable subspace and the state variables.

2.2 General Statistical Framework for SIMs

Each SIM looks quite different from others in concept, computation tools and interpretation. The original MOESP does a QR decomposition on [U f; Yf] and then a SVD on part of the R matrix. Part of the singular vector matrix is taken as Ff, based on which A and C matrices are estimated, and B and D are estimated through a LS fitting. N4SID projects Yf onto [Yp; Up; Uf] and does an SVD on the part corresponding to the past data, the right singular vectors are estimated as state variables and fit to the state space model. N4SID is interpreted in the concept of non-stationary Kalman filters. CVA uses CCA (Canonical Correlation Analysis) to estimate the state variables (called memory) and fit them to the state space model. It is interpreted in maximum likelihood principle. As for the detailed algorithms (refer to papers), the difference between these SIMs seems so large that it is hard to find the similarities between them. In fact, if the basic ideas behind these methods are scrutinized from the viewpoint of statistical regression, and the computation methods are analyzed in regression terms and related to each other, these methods are found to be very similar, and follow the same framework. The framework consists of three steps: i) Estimate the predictable subspace FfXk by a linear regression method ii) Extract state variables from the estimated subspace by a latent variable method


0 D

... ...


~,CAi-2 B N C H,,/ =

CA1-3 B 0 N

CAr-4 B 0 0

... 0 0

... ... ...









Ff and Hf show the effect of current states and the future inputs on the future outputs respectively The result of substituting Xk_p from (4) to (3) is (Fp+ is the pseudo-inverse): x~ : A~rp+rp + (n~ - APrp+Hp)Up + (n.,. p - APrp+H.,,p)W~ - A'rp+Vp

That is, current state sequence Xk (therefore FrXk) is a linear combination of the past data. FfXk is the free evolution of current outputs (with no future inputs) and independent of the system matrices. It is the part of future output space in (5) that can be estimated from the data relationships. System states can be defined as "the minimum amount of information about the past history of a system which is required to predict the future motion" (Astr/Dm, 1970). Here the linear combination of the terms on the right hand side of


iii) Then fit the estimated states to a state space model. The major differences among SIMs are in the first two steps and the third step is the same. The original MOSEP algorithm extracts Ff from the estimated subspace. Here MOESP is analyzed based on estimated states that come from exactly the same subspace as Ff (also refer to Van Overschee and De Moor, 1995).

method to estimate Hf implicitly and the predictable subspace via QR decomposition on [Uf; Yf]. If the input sequences are auto-correlated, this method regresses part of the state effect away and gives a biased result for the predictable subspace. SVD on this subspace will gives an asymptotical unbiased estimation of ~ ; however, the estimation Of Xk will be biased. 2. Regression Yf against [Yp; Up; Uf] (N4SID) Based on (6), we know FfXk in (5) can be estimated by a linear combination of the past inputs Up and past outputs Yp. It is a natural choice to regress Yf against [Yp; Up; Uf]. Here the regression coefficient for Uf is an estimate of Hf (/2(r) and the part corresponding to the past data is an estimation of the predictable subspace, which is equivalent to projection Y~-/2/.1Uf onto the past data. This estimation will have a slight bias if the input signals are auto-correlated. This bias will occur because of the correlation between past outputs and the past noise terms in (7). This is the method used in N4SID to estimate Hf and the predictable subspace. It is realized by QR decomposition of [Uf; Pio; Yf] (Pio=[Yp; Up]). The PO-MOESP (1994, past output (PO-) MOESP) gives similar results. 3.Constructing Hf from impulse weights of ARX model

3. E s t i m a t i o n of the P r e d i c t a b l e Subspace 3.1 Linear Regression for Hf to Estimate FfX k

In SIMs, the predictable subspace FfXk should be first estimated in order to have a basis for estimation of states Xk or F~ matrix. The central problem is how to remove the future input effects HfUf away from Yf in (5) in order to obtain a better estimate of the predictable subspace F~Xk. The coefficient matrix Hf is unknown and needs to be estimated. Hf shows the effects of Uf on If, and consists of the first f steps of impulse weights on lower diagonals (SISO) or block weights on block lower diagonals (MIMO). The true Hf is a lower block triangular matrix. These features (or requirements of Hf) are very informative; however, most algorithms do not make full use of these features. Different algorithm uses different method to estimate Hf from the input and output data sets. There are quite a few ways for this task; however, they all belong to the linear regression method. Once Hf is estimated, say ///, Yf-~ Uf is an estimation of the predictable subspace. This estimation includes the effects of the estimation errors in ~ and the effects of future stochastic signals, which can be removed away by projection to the past data. This projection procedure may induce some error; however, in most cases it is less than the unpredictable future noise. Some subspace identification methods, such as N4SID, do the estimation of Hf and projection onto the past data sets in one step.

The nature of Hf implies that it can be constructed by the first f impulse block weights. These impulse weight blocks can be estimated from a simple model, such as an ARX model or FIR model, which can be obtained by regressing Yk against Uk (if De0), past inputs (Up) and past outputs (Yp). The predictable subspace then is estimated as Yr ~ Uf. It includes all the future noise. This is the method some CVA algorithms use to estimated Hf and the predictable subspace. 4. Regression out method Uf can be regressed out of both sides of (7) by projecting to the orthogonal space of Uf, i.e., by post multiplying both side by Pufo=I-Ufr(ufufr)-lUf. This removes away the Uf term from the equation, and the coefficient matrices for past data in (7) can be obtained by regressing YfPufo against PioPufo. This result is equivalent to that from N4SID, and was implied in Van Overschee and De Moor (1995). The method has also been applied to the CVA method (refer to Van Overschee and De Moor, 1995; Carette, 2000). See next section for more discussion. Another similar approach is to regress past data p out of both sides of (7) (projecting to the orthogonal space of Pio, post multiplied by Ppo = I-P, oT(PIoPIoT)-'PIo) for the estimation of Hr. This turns out to be equivalent to the approach of N4SID. 5.Instrumental Variable Method. If there is a variable that is correlated to Uf but has no

3.2 Methods Used to Estimate Hf

1. Regression Yf against Uf (MOESP) Since Hf is the coefficient matrix relating /.If to Yf, it is natural to try to get Hf by directly performing LS regression of Yf against Uf as in (5). A basic assumption for an unbiased result is that the future inputs are un-correlated with the noise terms in (7), which will also include the effect of state variables/'fXk in this case. This method gives an unbiased result only when the inputs are white noise signals. Once Hf is estimated, the predictable subspace is estimated as Yf-/2//Uf. The original MOESP uses this


correlation with Xk and the future noise, an unbiased He can be estimated by the instrumental variable (IV) method based on (5). For auto-correlated inputs, Ue correlates with Xk through its correlation with Up, therefore the part of Ue, which has no correlation with past data, has no correlation with Xk. This part of Ue can be constructed by regressing Up out of Ue, and take the residual as the IV. Once He is estimated, the predictable subspace can be easily estimated. All these estimation methods are variants of the linear regression method: they differ only in their choice of the independent and dependent variables, of regression sequences, and the degree of utilization of knowledge about the features of He. The key problem comes from the correlation between Ue and Xk, which arises from autocorrelation of the input sequence in the open loop case. The estimation accuracy (bias and variance) in each method depends on the input signal, the true model structure, and the signal to noise ratio (SNR).

4.2 Methods Used for State Estimation

1. PCA (MOSEP and N4SID)
Both N4SID and MOESP extract X k by doing PCA on the estimated predictable subspace, which is essentially a SVD procedure. This implies assumptions that/-'fXk has a larger variation than that of the estimation error and the two parts are uncorrelated. The first assumption is well satisfied if the signal to noise ration is large, and this ensures the first n PCs are the estimated state variables. The second assumption is essentially for the unbiasness of the estimation, and this is not satisfied in case of auto-correlated inputs. The state-based MOESP (in original algorithm) directly uses PCA (SVD) on the estimated predictable subspace YfUf, where ~ is obtained by directly regressing Yf onto Uf, and PCs are taken as estimated states. This estimated predictable subspace includes all the furore noise and is not predicted by the past data, and therefore PCA results have large estimation errors and no guarantee of the predictability. The PO-MOESP applied PCA to the projection of estimated predictable subspace onto part of the past data space, therefore the result is generally improved. This method gives unbiased result for white noise inputs. If the inputs are auto-correlated, the result will be biased. N4SID applies PCA (SVD) on the part of the projection Yf/[Yp',Up;Uf] corresponding to the past data. As mentioned above, this result can be deemed as the predictable subspace projection Yc-/7/rUf on to the past data. Here @ is the third block of the regression coefficient matrix. In fact, it can be shown that this method is equivalent to performing RRA on past data and Yr ~ Uf (for proof, see Shi, 2001). It is clear that the best predictability in N4SID is in the sense of total predictable variance of Yf-/2(.1.Ue based on the past data, and this is assured by projection onto the past data, and at the same time the future noise is removed. Therefore the estimation error and bias of Xk from N4SID are very small in general. 2. CCA (CVA) CVA applies CCA on P~o=[Yp; Up] and Y=Yfe = Yc[IfUe, and the first n latent variables (CVs) from the past data set are estimates of Xk. By selecting the canonical variates with largest correlation as states, one is maximizing the relative variation in Y in each dimension rather than the absolute variation. CCA can also be applied to the results resulting from projecting future inputs Ue out both the past data and the future outputs, i.e. PioPueo and YePueo; however, the direct results of canonical variates J1P~oPueo are obviously biased estimation for Xk. Here the coefficient matrix J~ should be applied to the original past data to get state estimates: J~P~o. These estimates are no longer orthogonal. The result is

4. E s t i m a t i o n o f S t a t e V a r i a b l e s 4.1 Latent Variable Methods for State Estimation

The predictable subspace estimated by the linear regression methods of the last section is a high dimensional space (>>system order n) consisting of highly correlated variables. If there were no estimation error, this subspace should be only of rank n and any n independent variables in the data set or their linear combinations can be taken as state variables. However, the estimation error generally makes the space full rank. Direct choice of any n variables will have large estimation error and lose the useful information in all other variables, which are highly correlated to the true states. Extracting only n linear combinations from this highly correlated high-dimensional space and keeping as much information as possible will be the most desirable. This is exactly the general goal and the situation for which latent variable methods were developed. Latent variable methods are therefore employed in all SIMs as the methodology for estimation of the state variables from the predictable subspace. Latent variables (LVs) are linear combinations of the original (manifest) variables for optimization of a specific objective. There are a variety of latent variable methods based on different optimization objectives. In general terms, Principle Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Reduced Rank Analysis (RRA) are latent variable methods that maximize variance, covariance, correlation and predictable variance respectively (for details refer to Burnham, et al., 1996). Different SIMs employ different LVMs or use them in different ways to estimated state variables.


proven to give the unbiased state variable estimates (for proof, see Shi, 2001). However, since part of the state signal is removed away by regressing Uf out while the noise is kept intact, the data set YfPurohas a worse SNR than Yf-~/.If in general.
3. Other Possible Methods

Other LVMs, such as PLS and RRA, are possible choices for state extraction. For example, RRA should provide estimates of the states based on the same objective as N4SID, that is, states which maximizing the variance explained in the predictable subspace. However, it should give numerically improved estimates since it directly obtains the n LVs which explain the greatest variance in the predictable subspace, rather than the two-step procedure of the first performing an ill-conditioned least squares (oblique projection) followed by PCA/SVD (see Shi, 2001). Since the objective of PLS is to model the variance in the past data set and the predictable subspace as well as their correlation, it will generally not provide minimal order state models. In effect, it will try to provide state vectors for the joint input/output space (refer to Shi and MacGregor, 2000). Combinations of the methods used for the predictable subspace estimation and the methods used for the state variable estimation can lead to a whole set of different subspace identification methods.

State estimation by PLS gives relatively degraded results compared with CCA or RRA based on the same estimated predictable subspace by ARX. The estimated states by other methods are much closer to the true states. Similar conclusions are indicated by the squared multiple correlation R 2, which shows how much the total sum squares of the true states can be explained by the estimated states (two states are scaled to unit variance). CCA results based on PioPufo and YfPufo and those based on Pio and Yf-HfUf (using the true He) are also compared. The coefficient matrix J1 from the former is different from that of the latter (to large to show). The estimated states from the former method are not linear combinations of those estimated by the latter method, but they are very close. If the estimated states are used for fitting the state space model, each method can be compared by plotting their estimated impulse responses (Fig.l) and their errors (Fig.2). MOESP gives is a poor result for this example. The result from SIM-ARX-PLS has a large error but can be improved to match the others by using more LVs (refer to Shi and MacGregor, 2000). The results from other SIMs are very close to the true values. The somewhat irregular response from the ARX model is also shown for comparison. All SIMs give smooth response by fitting the LVs to the state equation.

6. Conclusions 5. Simulation Example

In this section, a simulation example is used to illustrate the discussed points. The example is a simple 1 st order SISO process with AR(1) noise, modeled as: 0.2z -1 1 Yk = ~ u k -~ ek l_0.8z -1 1-0.95z -1 The input signal is a PRBS signal with switching time period T~=5 and magnitude of 4.0. 1000 data points are collected with var(ek)=l.0, and SNR is about 0.93 (in variance). Both past and future lag steps are takes as 7 for every method. Different methods for estimation of the Hf matrix are applied to the simulation example and compared to the true result. A rough comparison is to take the mean of elements on lower diagonal as the estimated impulse weight. The results and total absolute errors are listed in Table 1. The results by regressing Yf directly onto Uf are clearly the farthest from the true values because of the bias resulting from the strong auto-correlation in the input PRBS signal. Other methods give results close to the true values. There are many indexes for comparison of the estimated states to the true states. One quick index is the canonical correlation coefficients between estimated states and the true states (see Table 2). This gives a clear idea on how the two spaces are consistent with each other. MOESP using the direct regression Ye/Uf clearly gives poor results. Although SIMs are quite different in their concepts and algorithms, they follow the same statistical framework set up here: (1) use of a linear regression method to estimate/-/f and the predictable subspace; (2) use of a latent variable method for estimation of a minimal set of the state variables; and (3) then fitting to the state space model. By discussing the SIMs in this framework their similarities and differences can be clearly seen. It also reveals possible new methods and new combinations of existing approaches for novel methods, such as use of the IV method for the estimation of Hf, and use of other latent variable methods RRA and PLS for state estimation.

AstrOm, K.J., Introduction to stochastic Control Theory, Academic Press, 1970 Burnham, A.J.R. Viveros and J.F. MacGregor, Frameworks for Latent Variable Multivariate Regression, Journal of Chemometrics, V 10, pp.31-45, 1996 Carette, P., personal communication, notes for CVA, 2000 Larimore, W. E., Canonical Variate Analysis for system Identification, Filtering and Adaptive Control, Proc. 29th IEEE conference on Decision and Control, Vol.1, Honolulu, Hawaii, 1990 Larimore, W. E., Optimal Reduced Rank Modeling, Prediction, Monitoring and Control Using Canonical Variate Analysis, preprints of ADCHEM, Banff, 1997


Ljung, L. And McKelvey, T., Subspace identification from closed loop data, Signal Processing, V52, 1996 Shi, R. and J. MacGregor, Modeling of Dynamic Systems using Latent Variable and Subspace Methods, J. of Chemometrics, V14, pp.423-439, 2000 Shi, R., Ph.D thesis, McMaster University, ON, Canada, 2001 Van Overschee, P. and De Moor, B., N4SID" Subspace Algorithms

Irrloulse ~ R
. . . .

by r n : ~ s frcrn MOESP, N4S1Dand CVA

. . . . .



for the Identification of Combined Deterministic-Stochastic System, Automatica, Vol. 30, No. 1, pp. 75-93, 1994 Van Overschee, P. and De Moor, B., A Unifying Theorem for Three Subspace System, Identification Algorithms,
Automatica, Vol.31. No. 12, pp. 1835-1864, 1995 Verhaegen, M. and Dewilde P., Subspace Model Identification,

/!~i~ o~ i"~t',,
05 il

............ N4SID


Part 1., The Output-error State-space model identification class of algorithms, International Journal of Control, V56,
pp.1187-1210, 1992 Verhaegen, M., Identification of Deterministic Part of MIMO State

, "~'"~'~,..~.,, ...

",... , ......... - ..... . .....

Space Models Given in Innovations from Input-output Data,

Automatica, V30, pp.61-74, 1994 Viberg, M., Subspace-based Methods for the Identification of Linear Time-invariant System, Automatica, V31, pp. 18351851, 1995

' 5


1; -~

. . 20 . 25 . 30 . 35 (Sa~ing Irter~t)




Fig. 1 Impulse response from SIMs

M:d~ing F~sdts by SIMs O.O35
i i i i J i i i i

Table 1 the Impulse Weights in Estimated Hf Method True Yf/Uf Y([PIo;Ufl ARX Wl w2 w3 w4 w5 w6 w7 Sum Abs. Err. 0 0.2 0.16 0.128 0.1024 0.0819 0.0655 0 0.1003 0.2716 0.2203 0.1770 0.1626 0.1613 0.2135 0.5688 0.0159 0.1951 0.1585 0.1035 0.0728 0.0668 0.0510 0.1061 0 0.1997 0.1546 0.1086 0.0863 0.0602 0.0610 0.0675

IV 0.0191 0.1953 0.1617 0.1137 0.0811 0.0750 0.0559 0.0778

lILl 0


/ ~.!:o'



~ E -0.015 -0.01

i~i ',,~' !l.:

G2;..!,"~ i .....
: i

Trua .... OVA ..... ..... S I ~ ............ S I M - I V ~


Table 2 Comparison between the Estimated and the True States Method MOESP N4SID CVA SIMSIM- SIMIVARX- ARXCCA PLS RRA IV ARX ARX Predictable Yf/Uf Yf/[PIo;Uf] ARX Subspace CCA PLS RRA Estimated PCA PCA CCA States 1st CC 0.8680 0.9993 0.9997 0.9995 0.9500 0.9993 2 nd CC 0.2599 0.9623 0.9613 0.9600 0.9122 0.9618 R2 0.4031 0.9612 0.9605 0.9590 0.8667 0.9606










Fig. 2 Error on the impulse responses