Anda di halaman 1dari 256

# EE263 Autumn 2008-09 Stephen Boyd

Lecture 1
Overview

course mechanics

some examples

11

Course mechanics

## all class info, lectures, homeworks, announcements on class web page:

www.stanford.edu/class/ee263

course requirements:

weekly homework

Overview 12
Prerequisites

control systems

dynamics

Overview 13

## basic quadratic control & estimation

Overview 14
Linear dynamical system

## continuous-time linear dynamical system (CT LDS) has the form

dx
= A(t)x(t) + B(t)u(t), y(t) = C(t)x(t) + D(t)u(t)
dt

where:

t R denotes time

Overview 15

## for lighter appearance, equations are often written

x = Ax + Bu, y = Cx + Du

## also called state equations, or m-input, n-state, p-output LDS

Overview 16
Some LDS terminology

## most linear systems encountered are time-invariant: A, B, C, D are

constant, i.e., dont depend on t

autonomous

## when u(t) and y(t) are scalar, system is called single-input,

single-output (SISO); when input & output signal dimensions are more
than one, MIMO

Overview 17

## x(t + 1) = A(t)x(t) + B(t)u(t), y(t) = C(t)x(t) + D(t)u(t)

where

t Z = {0, 1, 2, . . .}

## DT LDS is a first order vector recursion

Overview 18
Why study linear systems?

## automatic control systems

signal processing
communications
economics, finance
circuit analysis, simulation, design
mechanical and civil engineering
aeronautics
navigation, guidance

Overview 19

Usefulness of LDS

## depends on availability of computing power, which is large &

increasing exponentially

used for
analysis & design
implementation, embedded in real-time systems

## like DSP, was a specialized topic & technology 30 years ago

Overview 110
Origins and history

## builds on classical circuits & systems (1920s on) (transfer functions

. . . ) but with more emphasis on linear algebra

## transitioned from specialized topic to ubiquitous in 1980s

(just like digital signal processing, information theory, . . . )

Overview 111

linear systems?

## most techniques for nonlinear systems are based on linear methods

methods for linear systems often work unreasonably well, in practice, for
nonlinear systems

## if you dont understand linear dynamical systems you certainly cant

understand nonlinear dynamical systems

Overview 112
Examples (ideas only, no details)

x = Ax, y = Cx

## model of a lightly damped mechanical system, but it doesnt matter

Overview 113

typical output:
3

1
y

3
0 50 100 150 200 250 300 350

3
t
2

1
y

3
0 100 200 300 400 500 600 700 800 900 1000

## output waveform is very complicated; looks almost random and

unpredictable
well see that such a solution can be decomposed into much simpler
(modal) components

Overview 114
0.2
t
0

0.2
0 50 100 150 200 250 300 350
1

1
0 50 100 150 200 250 300 350
0.5

0.5
0 50 100 150 200 250 300 350
2

2
0 50 100 150 200 250 300 350
1

1
0 50 100 150 200 250 300 350
2

2
0 50 100 150 200 250 300 350
5

5
0 50 100 150 200 250 300 350
0.2

0.2
0 50 100 150 200 250 300 350

Overview 115

Input design

## where B R162, C R216 (same A as before)

problem: find appropriate u : R+ R2 so that y(t) ydes = (1, 2)
simple approach: consider static conditions (u, x, y constant):

x = 0 = Ax + Bustatic, y = ydes = Cx

## solve for u to get:

 
1 0.63
ustatic = CA1B ydes =
0.36

Overview 116
lets apply u = ustatic and just wait for things to settle:

0
0.2

u1
0.4
0.6
0.8
1
200 0 200 400 600 800 1000 1200 1400 1600 1800
t
0.4
0.3

u2
0.2
0.1
0
0.1
200 0 200 400 600 800 1000 1200 1400 1600 1800

2
t
1.5

y1 1

0.5

200 0 200 400 600 800 1000 1200 1400 1600 1800
t
0

1
y2

4
200 0 200 400 600 800 1000 1200 1400 1600 1800
t

## . . . takes about 1500 sec for y(t) to converge to ydes

Overview 117

using very clever input waveforms (EE263) we can do much better, e.g.
0.2
0
u1

0.2
0.4
0.6

0 10 20 30 40 50 60
t
0.4
u2

0.2

0.2
0 10 20 30 40 50 60
t
1
y1

0.5

0.5
0 10 20 30 40 50 60
t
0
0.5
y2

1
1.5
2
2.5
0 10 20 30 40 50 60
t

## . . . here y converges exactly in 50 sec

Overview 118
in fact by using larger inputs we do still better, e.g.
5

u1
0

5
5 0 5 10 15 20 25
t
1
0.5

u2
0
0.5
1
1.5
5 0 5 10 15 20 25

2
t
1

y1 0

5 0 5 10 15 20 25
t
0

0.5
y2

1.5

2
5 0 5 10 15 20 25
t

Overview 119

## the tradeoff between size of u and convergence time

Overview 120
Estimation / filtering

u w y
H(s) A/D

## A/D runs at 10Hz, with 3-bit quantizer

Overview 121

1
u(t)

1
0 1 2 3 4 5 6 7 8 9 10
1.5
s(t)

0.5

0
0 1 2 3 4 5 6 7 8 9 10
1
w(t)

1
0 1 2 3 4 5 6 7 8 9 10
1
y(t)

1
0 1 2 3 4 5 6 7 8 9 10

## problem: estimate original signal u, given quantized, filtered signal y

Overview 122
simple approach:

ignore quantization

## design equalizer G(s) for H(s) (i.e., GH 1)

approximate u as G(s)y

Overview 123

## formulate as estimation problem (EE263) . . .

1
(t) (dotted)

0.8

0.6

0.4

0.2
u(t) (solid) and u

0.2

0.4

0.6

0.8

1
0 1 2 3 4 5 6 7 8 9 10

## RMS error 0.03, well below quantization error (!)

Overview 124
EE263 Autumn 2008-09 Stephen Boyd

Lecture 2
Linear functions and examples

## linear equations and functions

engineering examples

interpretations

21

Linear equations

## y1 = a11x1 + a12x2 + + a1nxn

y2 = a21x1 + a22x2 + + a2nxn
..
ym = am1x1 + am2x2 + + amnxn

## can be written in matrix form as y = Ax, where

y1 a11 a12 a1n x1
y2 a21 a22 a2n x2
y=
..
A=
.. ... ..
x=
..

## Linear functions and examples 22

Linear functions

a function f : Rn Rm is linear if

f (x + y) = f (x) + f (y), x, y Rn
f (x) = f (x), x Rn R

x+y
f (y)
y
x
f (x + y)

f (x)

## converse is true: any linear function f : Rn Rm can be written as

f (x) = Ax for some A Rmn

## representation via matrix multiplication is unique: for any linear

function f there is only one matrix A for which f (x) = Ax for all x

## Linear functions and examples 24

Interpretations of y = Ax

y Rm

## Linear functions and examples 25

Interpretation of aij

n
X
yi = aij xj
j=1

aij is gain factor from jth input (xj ) to ith output (yi)

thus, e.g.,

## jth column of A concerns jth input

a27 = 0 means 2nd output (y2) doesnt depend on 7th input (x7)

## Linear functions and examples 26

|a52| |ai2| for i 6= 5 means x2 affects mainly y5

x1, . . . , xi

ith input

## more generally, sparsity pattern of A, i.e., list of zero/nonzero entries of

A, shows which xj affect which yi

## xj is external force applied at some node, in some fixed direction

yi is (small) deflection of some node, in some fixed direction

x1
x2
x3
x4

## A is called the compliance matrix

aij gives deflection i per unit force at j (in m/N)

## Linear functions and examples 28

Total force/torque on rigid body
x4
x1
x3 CG
x2

## xj is external force/torque applied at some point/direction/axis

y R6 is resulting total force & torque on body
(y1, y2, y3 are x-, y-, z- components of total force,
y4, y5, y6 are x-, y-, z- components of total torque)
we have y = Ax
A depends on geometry
(of applied forces and torques with respect to center of gravity CG)
jth column gives resulting force & torque for unit force/torque j

## Linear static circuit

interconnection of resistors, linear dependent (controlled) sources, and
independent sources
y3

ib x2

y1 x1 ib y2

## xj is value of independent source j

yi is some circuit variable (voltage, current)
we have y = Ax
if xj are currents and yi are voltages, A is called the impedance or
resistance matrix

## Linear functions and examples 210

Final position/velocity of mass due to applied forces

## unit mass, zero position/velocity at t = 0, subject to force f (t) for

0tn
f (t) = xj for j 1 t < j, j = 1, . . . , n
(x is the sequence of applied forces, constant in each interval)
y1, y2 are final position and velocity (i.e., at t = n)
we have y = Ax
a1j gives influence of applied force during j 1 t < j on final position
a2j gives influence of applied force during j 1 t < j on final velocity

## Linear functions and examples 211

Gravimeter prospecting

gi gavg

## xj = j avg is (excess) mass density of earth in voxel j;

yi is measured gravity anomaly at location i, i.e., some component
(typically vertical) of gi gavg
y = Ax

## Linear functions and examples 212

A comes from physics and geometry

at voxel j

## Linear functions and examples 213

Thermal system

location 4
heating element 5

x1
x2
x3
x4
x5

y = Ax

## Linear functions and examples 214

aij gives influence of heater j at location i (in C/W)

1W at heater j

pwr. xj

ij rij

illum. yi

## n lamps illuminating m (small, flat) patches, no shadows

xj is power of jth lamp; yi is illumination level of patch i
2
y = Ax, where aij = rij max{cos ij , 0}
(cos ij < 0 means patch i is shaded from lamp j)
jth column of A shows illumination pattern from lamp j

## Linear functions and examples 216

Signal and interference power in wireless system

n transmitter/receiver pairs
transmitter j transmits to receiver j (and, inadvertantly, to the other
receivers)
pj is power of jth transmitter
si is received signal power of ith receiver
zi is received interference power of ith receiver
Gij is path gain from transmitter j to receiver i
we have s = Ap, z = Bp, where
 
Gii i = j 0 i=j
aij = bij =
0 i=6 j Gij i 6= j

## Linear functions and examples 217

Cost of production

## production inputs (materials, parts, labor, . . . ) are combined to make a

number of products

product i

we have y = Ax

## Linear functions and examples 218

production inputs needed

we have r = AT q

## total production cost is

rT x = (AT q)T x = q T Ax

## Network traffic and flows

n flows with rates f1, . . . , fn pass from their source nodes to their
destination nodes over fixed routes in a network

## flow routes given by flow-link incidence matrix


1 flow j goes over link i
Aij =
0 otherwise

## Linear functions and examples 220

link delays and flow latency

time) of flows

l = AT d

Linearization

## x near x0 = f (x) very near f (x0) + Df (x0)(x x0)

where
fi
Df (x0)ij =
xj x0
is derivative (Jacobian) matrix

## with y = f (x), y0 = f (x0), define input deviation x := x x0, output

deviation y := y y0

function

## Linear functions and examples 222

Navigation by range measurement

beacons
(p1, q1)
(p4, q4) 1
4
2
(x, y) (p2, q2)

unknown position
3
(p3, q3)

## R4 is a nonlinear function of (x, y) R2:

p
i(x, y) = (x pi)2 + (y qi)2
 
x
linearize around (x0, y0): A , where
y

## (x0 pi) (y0 qi)

ai1 = p , ai2 = p
(x0 pi)2 + (y0 qi)2 (x0 pi)2 + (y0 qi)2

## ith row of A shows (approximate) change in ith range measurement for

(small) shift in (x, y) from (x0, y0)

## first column of A shows sensitivity of range measurements to (small)

change in x from x0

## obvious application: (x0, y0) is last navigation fix; (x, y) is current

position, a short time later

## Linear functions and examples 224

Broad categories of applications

## some broad categories of applications:

estimation or inversion

control or design

mapping or transformation

## Linear functions and examples 225

Estimation or inversion

y = Ax

## yi is ith measurement or sensor reading (which we know)

xj is jth parameter to be estimated or determined
aij is sensitivity of ith sensor to jth parameter

sample problems:

find x, given y
find all xs that result in y (i.e., all xs consistent with measurements)
if there is no x such that y = Ax, find x s.t. y Ax (i.e., if the sensor
readings are inconsistent, find x which is almost consistent)

## Linear functions and examples 226

Control or design

y = Ax

## x is vector of design parameters or inputs (which we can choose)

y is vector of results, or outcomes
A describes how input choices affect results

sample problems:

## find x so that y = ydes

find all xs that result in y = ydes (i.e., find all designs that meet
specifications)
among xs that satisfy y = ydes, find a small one (i.e., find a small or
efficient x that meets specifications)

## Linear functions and examples 227

Mapping or transformation

sample problems:

## determine if there is an x that maps to a given y

(if possible) find an x that maps to y
find all xs that map to a given y
if there is only one x that maps to y, find it (i.e., decode or undo the
mapping)

## Linear functions and examples 228

Matrix multiplication as mixture of columns

 
A= a1 a2 an

where aj Rm

1 0 0
0 1 0
e1 =
.. ,
e2 =
.. ,
... en =
..

0 0 1

## then Aej = aj , the jth column of A

(ej corresponds to a pure mixture, giving only column j)

## Linear functions and examples 230

Matrix multiplication as inner product with rows

## write A in terms of its rows:

T1

a
T2
a
A=
..

Tn
a

i Rn
where a
then y = Ax can be written as

T1 x

a
T2 x
a
y=
..

Tmx
a

thus yi = h
ai, xi, i.e., yi is inner product of ith row of A with x

## Linear functions and examples 231

geometric interpretation:

Ti x = is a hyperplane in Rn (normal to a
yi = a i )

yi = h
ai, xi = 0

yi = h
ai, xi = 3
i
a

yi = h
ai, xi = 2

yi = h
ai, xi = 1

## Linear functions and examples 232

Block diagram representation
y = Ax can be represented by a signal flow graph or block diagram
e.g. for m = n = 2, we represent
    
y1 a11 a12 x1
=
y2 a21 a22 x2
as
x1 a11 y1
a21
a12
x2 a22 y2

aij is the gain along the path from jth input to ith output
(by not drawing paths with zero gain) shows sparsity structure of A
(e.g., diagonal, block upper triangular, arrow . . . )

 
A11 A12
A=
0 A22

   
x1 y1
x= , y=
x2 y2

block diagram:

x1 A11 y1

A12

x2 A22 y2

n
X
cij = aik bkj
k=1

and x = Bz

z B x A y z AB y
p n m p m

## Linear functions and examples 236

Column and row interpretations

## can write product C = AB as

   
C= c1 cp = AB = Ab1 Abp

cT1
T
1 B
a
C= .
. = AB = .
cTm TmB
a

Ti bj = h
cij = a ai, bj i

## Gram matrix of vectors f1, . . . , fn defined as Gij = fiT fj

(gives inner product of each vector with the others)

## Linear functions and examples 238

Matrix multiplication interpretation via paths

x1 b11 z1 a11 y1
b21 a21

b21 a21
z2
x2 b22 a22 y2
path gain= a22b21

## Linear functions and examples 239

EE263 Autumn 2008-09 Stephen Boyd

Lecture 3
Linear algebra review

## range, nullspace, rank

change of coordinates

31

Vector spaces

## a vector space or linear space (over the reals) consists of

a set V

a vector sum + : V V V

a scalar multiplication : R V V

a distinguished element 0 V

## Linear algebra review 32

x + y = y + x, x, y V (+ is commutative)

(x + y) + z = x + (y + z), x, y, z V (+ is associative)

0 + x = x, x V (0 is additive identity)

1x = x, x V

Examples

multiplication

## span(v1, v2, . . . , vk ) = {1v1 + + k vk | i R}

and v1, . . . , vk Rn

Subspaces

a vector space

multiplication

## V4 = {x : R+ Rn | x is differentiable}, where vector sum is sum of

functions:
(x + z)(t) = x(t) + z(t)
and scalar multiplication is defined by

(x)(t) = x(t)

## (a point in V4 is a trajectory in Rn)

V5 = {x V4 | x = Ax}
(points in V5 are trajectories of the linear system x = Ax)

V5 is a subspace of V4

## Linear algebra review 36

Independent set of vectors

## a set of vectors {v1, v2, . . . , vk } is independent if

1v1 + 2v2 + + k vk = 0 = 1 = 2 = = 0

## 1v1 + 2v2 + + k vk = 1v1 + 2v2 + + k vk

implies 1 = 1, 2 = 2, . . . , k = k

## no vector vi can be expressed as a linear combination of the other

vectors v1, . . . , vi1, vi+1, . . . , vk

## equivalent: every v V can be uniquely expressed as

v = 1v1 + + k vk

fact: for a given vector space V, the number of vectors in any basis is the
same

## Linear algebra review 38

Nullspace of a matrix

## the nullspace of A Rmn is defined as

N (A) = { x Rn | Ax = 0 }

x, then x

Zero nullspace

N (A) = {0}

## x can always be uniquely determined from y = Ax

(i.e., the linear transformation y = Ax doesnt lose information)

det(AT A) 6= 0

## Linear algebra review 310

Interpretations of nullspace

suppose z N (A)

y = Ax represents measurement of x

## z is undetectable from sensors get zero sensor readings

x and x + z are indistinguishable from sensors: Ax = A(x + z)

## z is an input with no result

x and x + z have same result

## Linear algebra review 311

Range of a matrix

Onto matrices

## Ax = y can be solved in x for any y

columns of A span Rm

N (AT ) = {0}

det(AAT ) 6= 0

## Linear algebra review 313

Interpretations of range

## suppose v R(A), w 6 R(A)

y = Ax represents measurement of x

wrong

## Linear algebra review 314

Inverse
A Rnn is invertible or nonsingular if det A 6= 0
equivalent conditions:

AA1 = A1A = I

N (A) = {0}

R(A) = Rn

## Linear algebra review 315

Interpretations of inverse

## mapping associated with B undoes mapping associated with A (applied

either before or after!)

## x = By is a perfect (pre- or post-) equalizer for the channel y = Ax

x = By is unique solution of Ax = y

## Linear algebra review 316

Dual basis interpretation

## from y = x1a1 + + xnan and xi = bTi y, we get

n
(bTi y)ai
X
y=
i=1

thus, inner product with rows of inverse matrix gives the coefficients in
the expansion of a vector in the columns of the matrix

Rank of a matrix

## rank(A) = dim R(A)

(nontrivial) facts:

rank(A) = rank(AT )

## rank(A) is maximum number of independent columns (or rows) of A

hence rank(A) min(m, n)

## Linear algebra review 318

Conservation of dimension

## conservation of dimension: each dimension of input is either crushed

to zero or ends up in output

roughly speaking:
n is number of degrees of freedom in input x
dim N (A) is number of degrees of freedom lost in the mapping from
x to y = Ax
rank(A) is number of degrees of freedom in output y

## conversely: if rank(A) = r then A Rmn can be factored as A = BC

with B Rmr , C Rrn:
rank(A) lines

x n A m y x n C r B m y

y from x

## Linear algebra review 320

Application: fast matrix-vector multiplication

## computing y = Ax as y = B(Cx) (compute z = Cx first, then

y = Bz): rn + mr = (m + n)r operations

## for square matrices, full rank means nonsingular

for skinny matrices (m n), full rank means columns are independent

for fat matrices (m n), full rank means rows are independent

## Linear algebra review 322

Change of coordinates

## standard basis vectors in Rn: (e1, e2, . . . , en) where

0
..

ei = 1

..

0

(1 in ith component)

obviously we have

## if (t1, t2, . . . , tn) is another basis for Rn, we have

x=x 2 t2 + + x
1 t1 + x n tn

where x
i are the coordinates of x in the basis (t1, t2, . . . , tn)

 
define T = t1 t2 t n so x = T x
, hence

= T 1x
x

## Linear algebra review 324

consider linear transformation y = Ax, A Rnn

x = Tx
, y = T y

so
y = (T 1AT )
x

## similarity transformation by T expresses linear transformation y = Ax in

coordinates t1, t2, . . . , tn

(Euclidean) norm

## for x Rn we define the (Euclidean) norm as

q
kxk = x21 + x22 + + x2n = xT x

## kxk measures length of vector (from origin)

important properties:

## kxk = ||kxk (homogeneity)

kx + yk kxk + kyk (triangle inequality)
kxk 0 (nonnegativity)
kxk = 0 x = 0 (definiteness)

## Linear algebra review 326

RMS value and (Euclidean) distance

n
!1/2
1X kxk
rms(x) = x2i =
n i=1
n

x
xy

Inner product

## hx, yi := x1y1 + x2y2 + + xnyn = xT y

important properties:

hx, yi = hx, yi
hx + y, zi = hx, zi + hy, zi
hx, yi = hy, xi
hx, xi 0
hx, xi = 0 x = 0

vector xT

## Linear algebra review 328

Cauchy-Schwartz inequality and angle between vectors

## (unsigned) angle between vectors in Rn defined as

xT y
= 6 (x, y) = cos1
kxkkyk

xT y
 
kyk2
y

special cases:

## x and y are aligned: = 0; xT y = kxkkyk;

(if x 6= 0) y = x for some 0

## x and y are opposed: = ; xT y = kxkkyk

(if x 6= 0) y = x for some 0

denoted x y

## Linear algebra review 330

interpretation of xT y > 0 and xT y < 0:

## xT y > 0 means 6 (x, y) is acute

xT y < 0 means 6 (x, y) is obtuse
x x

y y
xT y > 0 xT y < 0

## {x | xT y 0} defines a halfspace with outward normal vector y, and

boundary passing through 0

{x | xT y 0}
0
y

## Linear algebra review 331

EE263 Autumn 2008-09 Stephen Boyd

Lecture 4
Orthonormal sets of vectors and QR
factorization

41

## set of vectors u1, . . . , uk Rn is

normalized if kuik = 1, i = 1, . . . , k
(ui are called unit vectors or direction vectors)
orthogonal if ui uj for i 6= j
orthonormal if both

## slang: we say u1, . . . , uk are orthonormal vectors but orthonormality (like

independence) is a property of a set of vectors, not vectors individually

U T U = Ik

## Orthonormal sets of vectors and QR factorization 42

orthonormal vectors are independent
(multiply 1u1 + 2u2 + + k uk = 0 by uTi )

## hence u1, . . . , uk is an orthonormal basis for

span(u1, . . . , uk ) = R(U )

## warning: if k < n then U U T 6= I (since its rank is at most k)

(more on this matrix later . . . )

## Orthonormal sets of vectors and QR factorization 43

Geometric properties

## Orthonormal sets of vectors and QR factorization 44

inner products are also preserved: hU z, U zi = hz, zi

if w = U z and w
= U z then

= hU z, U zi = (U z)T (U z) = z T U T U z = hz, zi
hw, wi

6 (U z, U z
) = 6 (z, z)

UTU = I

n
X
uiuTi = I
i=1

## Orthonormal sets of vectors and QR factorization 46

Expansion in orthonormal basis

n
X
x= (uTi x)ui
i=1

## x = U a reconstitutes x from its ui components

n
X
x = Ua = aiui is called the (ui-) expansion of x
i=1

## Orthonormal sets of vectors and QR factorization 47

Pn
the identity I = U U T = T
i=1 ui ui is sometimes written (in physics) as

n
X
I= |uiihui|
i=1

since n
X
x= |uiihui|xi
i=1
(but we wont use this notation)

## Orthonormal sets of vectors and QR factorization 48

Geometric interpretation

examples:

 
cos sin
y = U x, U =
sin cos

 
cos sin
y = R x, R =
sin cos

## Orthonormal sets of vectors and QR factorization 410

x2 x2
rotation reflection

e2 e2

e1 x1 e1 x1

## Orthonormal sets of vectors and QR factorization 411

Gram-Schmidt procedure

## given independent vectors a1, . . . , ak Rn, G-S procedure finds

orthonormal vectors q1, . . . , qk s.t.

## rough idea of method: first orthogonalize each vector w.r.t. previous

ones; then normalize result to have norm one

## Orthonormal sets of vectors and QR factorization 412

Gram-Schmidt procedure

step 1a. q1 := a1

q1k (normalize)

q2k (normalize)

q3k (normalize)

etc.

## Orthonormal sets of vectors and QR factorization 413

a2 q2 = a2 (q1T a2)q1

q2

q1 q1 = a1

for i = 1, 2, . . . , k we have

## ai = (q1T ai)q1 + (q2T ai)q2 + + (qi1

T
ai)qi1 + k
qikqi
= r1iq1 + r2iq2 + + riiqi

(note that the rij s come right out of the G-S procedure, and rii 6= 0)

QR decomposition

## written in matrix form: A = QR, where A Rnk , Q Rnk , R Rkk :

r11 r12 r1k
    0 r22 r2k
a1 a2 ak = q1 q2 qk
| {z } | {z } .. .. ... ..
A Q
0 0 rkk
| {z }
R

## usually computed using a variation on Gram-Schmidt procedure which is

less sensitive to numerical (rounding) errors

## if a1, . . . , ak are dependent, we find qj = 0 for some j, which means aj

is linearly dependent on a1, . . . , aj1

## modified algorithm: when we encounter qj = 0, skip to next vector aj+1

and continue:
r = 0;
for i = 1, . . . , k
{ Pr
= ai j=1 qj qjT ai;
a
6= 0 { r = r + 1; qr = a
if a ak; }
/k
}

on exit,

## q1, . . . , qr is an orthonormal basis for R(A) (hence r = Rank(A))

each ai is linear combination of previously generated qj s

## in matrix notation we have A = QR with QT Q = Ir and R Rrk in

upper staircase form:

possibly nonzero entries

zero entries

S]P
A = Q[R

where:

QT Q = Ir

R

## P Rkk is a permutation matrix

(which moves forward the columns of a which generated a new q)

Applications

previous ones

## works incrementally: one G-S procedure yields QR factorizations of

[a1 ap] for p = 1, . . . , k:

## Orthonormal sets of vectors and QR factorization 419

Full QR factorization

 
  R1
A= Q1 Q2
0

## where [Q1 Q2] is orthogonal, i.e., columns of Q2 Rn(nr) are

orthonormal, orthogonal to Q1

to find Q2:

## find any matrix A s.t. [A A]

is full rank (e.g., A = I)

apply general Gram-Schmidt to [A A]
Q1 are orthonormal vectors obtained from columns of A

Q2 are orthonormal vectors obtained from extra columns (A)

## Orthonormal sets of vectors and QR factorization 420

i.e., any set of orthonormal vectors can be extended to an orthonormal
basis for Rn
R(Q1) and R(Q2) are called complementary subspaces since

they are orthogonal (i.e., every vector in the first subspace is orthogonal
to every vector in the second subspace)

## their sum is Rn (i.e., every vector in Rn can be expressed as a sum of

two vectors, one from each subspace)

this is written

R(Q1) + R(Q2) = Rn

## R(Q2) = R(Q1) (and R(Q1) = R(Q2))

(each subspace is the orthogonal complement of the other)

## Orthogonal decomposition induced by A

 
  QT1
from AT = R1T 0 we see that
QT2

AT z = 0 QT1 z = 0 z R(Q2)

so R(Q2) = N (AT )
(in fact the columns of Q2 are an orthonormal basis for N (AT ))

## we conclude: R(A) and N (AT ) are complementary subspaces:

R(A) + N (AT ) = Rn (recall A Rnk )

## Orthonormal sets of vectors and QR factorization 422

every y Rn can be written uniquely as y = z + w, with z R(A),
w N (AT ) (well soon see what the vector z is . . . )

can now prove most of the assertions from the linear algebra review
lecture

## switching A Rnk to AT Rkn gives decomposition of Rk :

N (A) + R(AT ) = Rk

## Orthonormal sets of vectors and QR factorization 423

EE263 Autumn 2008-09 Stephen Boyd

Lecture 5
Least-squares

## projection and orthogonality principle

least-squares estimation

BLUE property

51

## called overdetermined set of linear equations

(more equations than unknowns)
for most y, cannot solve for x

## define residual or error r = Ax y

find x = xls that minimizes krk

## xls called least-squares (approximate) solution of y = Ax

Least-squares 52
Geometric interpretation

y r

Axls

R(A)

Least-squares 53

## assume A is full rank, skinny

to find xls, well minimize norm of residual squared,

krk2 = xT AT Ax 2y T Ax + y T y

## yields the normal equations: AT Ax = AT y

assumptions imply AT A invertible, so we have

## . . . a very famous formula

Least-squares 54
xls is linear function of y

## A is a left inverse of (full rank, skinny) A:

AA = (AT A)1AT A = I

Least-squares 55

Projection on R(A)

Axls is (by definition) the point in R(A) that is closest to y, i.e., it is the
projection of y onto R(A)

Axls = PR(A)(y)

## A(AT A)1AT is called the projection matrix (associated with R(A))

Least-squares 56
Orthogonality principle
optimal residual

## r = Axls y = (A(AT A)1AT I)y

is orthogonal to R(A):

## hr, Azi = y T (A(AT A)1AT I)T Az = 0

for all z Rn

y r

Axls
R(A)

Least-squares 57

Completion of squares

## kAx yk2 = k(Axls y) + A(x xls)k2

= kAxls yk2 + kA(x xls)k2

## this shows that for x 6= xls, kAx yk > kAxls yk

Least-squares 58
Least-squares via QR factorization

## factor as A = QR with QT Q = In, R Rnn upper triangular,

invertible

pseudo-inverse is

so xls = R1QT y

Least-squares 59

## Least-squares via full QR factorization

full QR factorization:
 
R1
A = [Q1 Q2]
0

invertible

## multiplication by orthogonal matrix doesnt change norm, so

  2
2
R 1

kAx yk = [Q1 Q2 ] x y
0
  2
T R 1 T

= [Q1 Q2 ] [Q1 Q 2 ] x [Q1 Q2 ] y
0

Least-squares 510
R1x QT1 y 2
 
=
QT2 y

## this is evidently minimized by choice xls = R11QT1 y

(which makes first term zero)

Axls y = Q2QT2 y

## Q2QT2 gives projection onto R(A)

Least-squares 511

Least-squares estimation

have form

y = Ax + v

## ith row of A characterizes ith sensor

Least-squares 512
least-squares estimation: choose as estimate x
that minimizes

kA
x yk

## what we would observe if x = x

, and there were no noise (v = 0)

= (AT A)1AT y
least-squares estimate is just x

Least-squares 513

BLUE property

y = Ax + v

## consider a linear estimator of form x

= By

called unbiased if x
= x whenever v = 0
(i.e., no estimation error when there is no noise)
same as BA = I, i.e., B is left inverse of A

Least-squares 514
estimation error of unbiased linear estimator is

xx
= x B(Ax + v) = Bv

following sense:

X X
2
Bij A2
ij
i,j i,j

## i.e., least-squares provides the best linear unbiased estimator (BLUE)

Least-squares 515

beacons

k4 k1
k2
x
k3
unknown position

## beacons far from unknown position x R2, so linearization around x = 0

(say) nearly exact

Least-squares 516
ranges y R4 measured, with measurement noise v:

k1T

k2T
k3T x + v
y =

k4T

## measurement errors are independent, Gaussian, with standard deviation 2

(details not important)

## actual position is x = (5.59, 10.58);

measurement is y = (11.95, 2.84, 9.81, 2.81)

Least-squares 517

## y1 and y2 suffice to find x (when v = 0)

compute estimate x
by inverting top (2 2) half of A:
   
0 1.0 0 0 2.84
x
= Bjey = y=
1.12 0.5 0 0 11.9

## (norm of error: 3.07)

Least-squares 518
Least-squares method

compute estimate x
by least-squares:
   
0.23 0.48 0.04 0.44 4.95
x
=A y= y=
0.47 0.02 0.51 0.18 10.26

## larger entries in B lead to larger estimation error

Least-squares 519

u w y
H(s) A/D

## signal u is piecewise constant, period 1 sec, 0 t 10:

u(t) = xj , j 1 t < j, j = 1, . . . , 10

## filtered by system with impulse response h(t):

Z t
w(t) = h(t )u( ) d
0

## sample at 10Hz: yi = w(0.1i), i = 1, . . . , 100

Least-squares 520
3-bit quantization: yi = Q(
yi), i = 1, . . . , 100, where Q is 3-bit
quantizer characteristic

## problem: estimate x R10 given y R100

example:
1

u(t) 0

1
0 1 2 3 4 5 6 7 8 9 10
1.5
s(t)

0.5

0
0 1 2 3 4 5 6 7 8 9 10
1
y(t) w(t)

1
0 1 2 3 4 5 6 7 8 9 10
1

1
0 1 2 3 4 5 6 7 8 9 10

t
Least-squares 521

we have y = Ax + v, where

Z j
10010
AR is given by Aij = h(0.1i ) d
j1

## v R100 is quantization error: vi = Q(

yi) yi (so |vi| 0.125)

## least-squares estimate: xls = (AT A)1AT y

(t) (dotted)

0.8

0.6

0.4

0.2
u(t) (solid) & u

0.2

0.4

0.6

0.8

1
0 1 2 3 4 5 6 7 8 9 10

Least-squares 522
kx xlsk
RMS error is = 0.03
10

## more on this later . . .

Least-squares 523

## some rows of Bls = (AT A)1AT :

0.15
row 2

0.1

0.05

0.05
0 1 2 3 4 5 6 7 8 9 10

0.15
row 5

0.1

0.05

0.05
0 1 2 3 4 5 6 7 8 9 10

0.15
row 8

0.1

0.05

0.05
0 1 2 3 4 5 6 7 8 9 10

## rows show how sampled measurements of y are used to form estimate

of xi for i = 2, 5, 8
to estimate x5, which is the original input signal for 4 t < 5, we
mostly use y(t) for 3 t 7

Least-squares 524
EE263 Autumn 2008-09 Stephen Boyd

Lecture 6
Least-squares applications

## growing sets of regressors

system identification

61

we are given:

## functions f1, . . . , fn : S R, called regressors or basis functions

data or measurements (si, gi), i = 1, . . . , m, where si S and (usually)
mn

## i.e., find linear combination of functions that fits data

least-squares fit: choose x to minimize total square fitting error:
m
2
X
(x1f1(si) + + xnfn(si) gi)
i=1

Least-squares applications 62
using matrix notation, total square fitting error is kAx gk2, where
Aij = fj (si)

x = (AT A)1AT g

## (assuming A is skinny, full rank)

corresponding function is

## flsfit(s) = x1f1(s) + + xnfn(s)

applications:
interpolation, extrapolation, smoothing of data
developing simple, approximate model of data

Least-squares applications 63

i

1 t1 t21 t1n1

1 t2 t2 tn1
A= 2 2
.. ..

2 n1
1 tm t m t m

## (called a Vandermonde matrix)

Least-squares applications 64
assuming tk 6= tl for k 6= l and m n, A is full rank:

suppose Aa = 0

## corresponding polynomial p(t) = a0 + + an1tn1 vanishes at m

points t1, . . . , tm

## by fundamental theorem of algebra p can have no more than n 1

zeros, so p is identically zero, and a = 0

## columns of A are independent, i.e., A full rank

Least-squares applications 65

Example

## m = 100 points between t = 0 & t = 1

least-squares fit for degrees 1, 2, 3, 4 have RMS errors .135, .076, .025,
.005, respectively

Least-squares applications 66
1

p1(t)
0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
p2(t)

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
p3(t)

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
p4(t)

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

t
Least-squares applications 67

## consider family of least-squares problems

Pp
minimize k i=1 xi ai yk

for p = 1, . . . , n

## project y onto span{a1, . . . , ap}

regress y on a1, . . . , ap

## as p increases, get better fit, so optimal residual decreases

Least-squares applications 68
solution for each p n is given by

(p)
xls = (ATp Ap)1ATp y = Rp1QTp y

where

## Qp = [q1 qp] is the first p columns of Q

Least-squares applications 69

## plot of optimal residual versus p shows how well y can be matched by

linear combination of a1, . . . , ap, as function of p

kresidualk

kyk
minx1 kx1a1 yk

P7
minx1,...,x7 k i=1 xi ai yk
p
0 1 2 3 4 5 6 7

## Least-squares applications 610

Least-squares system identification

## system identification problem: find reasonable model for system based

on measured I/O data u, y

example with scalar u, y (vector u, y readily handled): fit I/O data with
moving-average (MA) model with n delays

## y(t) = h0u(t) + h1u(t 1) + + hnu(t n)

where h0, . . . , hn R

## we can write model or predicted output as

y(n) u(n) u(n 1) u(0) h0
y(n + 1) u(n + 1) u(n) u(1) h1
.. = .. .. .. .
.
y(N ) u(N ) u(N 1) u(N n) hn

## least-squares identification: choose model (i.e., h) that minimizes norm

of model prediction error kek

## Least-squares applications 612

Example
4

u(t)
0

4
0 10 20 30 40 50 60 70
t
5

y(t) 0

5
0 10 20 30 40 50 60 70
t
for n = 7 we obtain MA model with

(h0, . . . , h7) = (.024, .282, .418, .354, .243, .487, .208, .441)

## Least-squares applications 613

5
solid: y(t): actual output
4 dashed: y(t), predicted from model
3

4
0 10 20 30 40 50 60 70
t

## Least-squares applications 614

Model order selection

## question: how large should n be?

obviously the larger n, the smaller the prediction error on the data used
to form the model

suggests using largest possible model order for smallest prediction error

## Least-squares applications 615

1
relative prediction error kek/kyk

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45 50
n

difficulty: for n too large the predictive ability of the model on other I/O
data (from the same system) becomes worse

## Least-squares applications 616

Cross-validation

evaluate model predictive performance on another I/O data set not used to
develop model

## model validation data set:

4

(t)
u 0

4
0 10 20 30 40 50 60 70
t
5
y(t)

5
0 10 20 30 40 50 60 70
t
Least-squares applications 617

## now check prediction error of models (developed using modeling data) on

validation data:

1
relative prediction error

0.8

0.6

0.4

validation data
0.2
modeling data
0
0 5 10 15 20 25 30 35 40 45 50
n

## Least-squares applications 618

for n = 50 the actual and predicted outputs on system identification and
model validation data are:
5
solid: y(t)
dashed: predicted y(t)
0

5
0 10 20 30 40 50 60 70
t
5
solid: y(t)
dashed: predicted y(t)
0

5
0 10 20 30 40 50 60 70
t

overmodeling

## least-squares problem in row form:

m
X
2
minimize kAx yk = (aTi x yi)2
i=1

## x Rn is some vector to be estimated

each pair ai, yi corresponds to one measurement
solution is !1
m
X m
X
xls = aiaTi yiai
i=1 i=1

with time

## Least-squares applications 620

Recursive least-squares

m
!1 m
X X
we can compute xls(m) = aiaTi yiai recursively
i=1 i=1

## initialize P (0) = 0 Rnn, q(0) = 0 Rn

for m = 0, 1, . . . ,

## P (m) is invertible a1, . . . , am span Rn

(so, once P (m) becomes invertible, it stays invertible)

## Fast update for recursive least-squares

we can calculate
1
P (m + 1)1 = P (m) + am+1aTm+1

## efficiently from P (m)1 using the rank one update formula

1 1
P + aaT = P 1 (P 1a)(P 1a)T
1+ aT P 1a

## gives an O(n2) method for computing P (m + 1)1 from P (m)1

standard methods for computing P (m + 1)1 from P (m + 1) are O(n3)

## Least-squares applications 622

Verification of rank one update formula

 
1
(P + aaT ) P 1 (P 1a)(P 1a)T
1 + aT P 1a
1
= I + aaT P 1 T
P (P 1a)(P 1a)T
1+a P a 1

1
T
aaT (P 1a)(P 1a)T
1 + a P 1a
1 aT P 1a
= I + aaT P 1 T
aa T 1
P T
aaT P 1
1+a P a 1 1+a P a 1

= I

## Least-squares applications 623

EE263 Autumn 2008-09 Stephen Boyd

Lecture 7
Regularized least-squares and Gauss-Newton
method

multi-objective least-squares

regularized least-squares

nonlinear least-squares

Gauss-Newton method

71

Multi-objective least-squares

## and also J2 = kF x gk2 small

(x Rn is the variable)

## usually the objectives are competing

we can make one smaller, at the expense of making the other larger

## Regularized least-squares and Gauss-Newton method 72

Plot of achievable objective pairs

## plot (J2, J1) for every x:

J1 x(1)

x(3)
x(2)

J2

note that x Rn, but this plot is in R2; point labeled x(1) is really
J2(x(1)), J1(x(1))

## corresponding x are called Pareto optimal

(for the two objectives kAx yk2, kF x gk2)

## Regularized least-squares and Gauss-Newton method 74

Weighted-sum objective

## to find Pareto optimal points, i.e., xs on optimal trade-off curve, we

minimize weighted-sum objective

## points where weighted sum is constant, J1 + J2 = , correspond to

line with slope on (J2, J1) plot

PSfrag

J1 x(1)

x(3)
x(2)
J1 + J2 =

J2

## Regularized least-squares and Gauss-Newton method 76

Minimizing weighted-sum objective
can express weighted-sum objective as ordinary least-squares objective:
    2
2 2
A y
kAx yk + kF x gk = x
F g
2

= Ax y

where    
A y
A = , y =
F g

## hence solution is (assuming A full rank)

 1
x = T
A A AT y

AT A + F T F AT y + F T g
1 
=

Example

optimal x:
x = aaT + I
1
a

## Regularized least-squares and Gauss-Newton method 78

optimal trade-off curve:
1

0.9

0.8

0.7

J1 = (y 1)2
0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5 3 3.5
3
x 10
2
J2 = kxk

## Regularized least-squares and Gauss-Newton method 79

Regularized least-squares

## minimizer of weighted-sum objective,

x = AT A + I AT y,
1

## Regularized least-squares and Gauss-Newton method 710

estimation/inversion application:

Ax y is sensor residual

## Regularized least-squares and Gauss-Newton method 711

Nonlinear least-squares

m
X
2
kr(x)k = ri(x)2,
i=1

where r : Rn Rm

## Regularized least-squares and Gauss-Newton method 712

Position estimation from ranges

## estimate position x R2 from approximate distances to beacons at

locations b1, . . . , bm R2 without linearizing

we measure i = kx bik + vi
(vi is range error, unknown but assumed small)

## NLLS estimate: choose x

to minimize
m m
2
X X
2
ri(x) = (i kx bik)
i=1 i=1

## Gauss-Newton method for NLLS

m
X
2
n
NLLS: find x R that minimizes kr(x)k = ri(x)2, where
i=1
r : Rn Rm

## in general, very hard to solve exactly

many good heuristics to compute locally optimal solution

Gauss-Newton method:

## given starting guess for x

repeat
linearize r near current guess
new guess is linear LS solution, using linearized r
until convergence

## Regularized least-squares and Gauss-Newton method 714

Gauss-Newton method (more detail):

2
2
(k) (k)
kr(x)k A x b

## next iterate solves this linearized LS problem:

 1
x(k+1) = A(k)T A(k) A(k)T b(k)

## Regularized least-squares and Gauss-Newton method 716

Gauss-Newton example

10 beacons
+ true position (3.6, 3.2); initial guess (1.2, 1.2)
range estimates accurate to 0.5
5

5
5 4 3 2 1 0 1 2 3 4 5

16

14

12

10

5
0
5 0
0
5 5

## for a linear LS problem, objective would be nice quadratic bowl

bumps in objective due to strong nonlinearity of r

## Regularized least-squares and Gauss-Newton method 718

objective of Gauss-Newton iterates:
12

10

kr(x)k2
6

0
1 2 3 4 5 6 7 8 9 10

iteration

## Regularized least-squares and Gauss-Newton method 719

final estimate is x
= (3.3, 3.3)

estimation error is k
x xk = 0.31
(substantially smaller than range accuracy!)

## Regularized least-squares and Gauss-Newton method 720

convergence of Gauss-Newton iterates:
5
4
3
4
56
3

1 2

1 1

5
5 4 3 2 1 0 1 2 3 4 5

## kA(k)x b(k)k2 + kx x(k)k2

so that next iterate is not too far from previous one (hence, linearized
model still pretty accurate)

## Regularized least-squares and Gauss-Newton method 722

EE263 Autumn 2008-09 Stephen Boyd

Lecture 8
Least-norm solutions of undetermined
equations

81

## Underdetermined linear equations

we consider
y = Ax
where A Rmn is fat (m < n), i.e.,

## x is underspecified, i.e., many choices of x lead to the same y

well assume that A is full rank (m), so for each y Rm, there is a solution

## set of all solutions has form

{ x | Ax = y } = { xp + z | z N (A) }

## Least-norm solutions of undetermined equations 82

z characterizes available choices in solution

## Least-norm solutions of undetermined equations 83

Least-norm solution

## i.e., xln is solution of optimization problem

minimize kxk
subject to Ax = y

## Least-norm solutions of undetermined equations 84

suppose Ax = y, so A(x xln) = 0 and

## (x xln)T xln = (x xln)T AT (AAT )1y

T
= (A(x xln)) (AAT )1y
= 0

## Least-norm solutions of undetermined equations 85

{ x | Ax = y }

N (A) = { x | Ax = 0 }
xln

{ x | Ax = y }

## Least-norm solutions of undetermined equations 86

A = AT (AAT )1 is called the pseudo-inverse of full rank, fat A

A = (AT A)1AT

Q Rnm, QT Q = Im

then

kxlnk = kRT yk

## Least-norm solutions of undetermined equations 88

Derivation via Lagrange multipliers

## least-norm solution solves optimization problem

minimize xT x
subject to Ax = y

## optimality conditions are

xL = 2x + AT = 0, L = Ax y = 0

## y = Ax where A R210 (A is fat)

find least norm force that transfers mass unit distance with zero final
velocity, i.e., y = (1, 0)

0.06

0.04

xln
0.02

0.02

0.04

0.06
0 2 4 6 8 10 12

1
t
0.8

position
0.6

0.4

0.2

0
0 2 4 6 8 10 12

0.2
t
velocity

0.15

0.1

0.05

0
0 2 4 6 8 10 12

## suppose A Rmn is fat, full rank

define J1 = kAx yk2, J2 = kxk2
least-norm solution minimizes J2 with J1 = 0
minimizer of weighted-sum objective J1 + J2 = kAx yk2 + kxk2 is
1
x = AT A + I AT y

## fact: x xln as 0, i.e., regularized solution converges to

least-norm solution as 0
in matrix terms: as 0,
1 1
AT A + I AT AT AAT

## Least-norm solutions of undetermined equations 812

General norm minimization with equality constraints

consider problem
minimize kAx bk
subject to Cx = d
with variable x

## includes least-squares and least-norm problems as special cases

equivalent to
minimize (1/2)kAx bk2
subject to Cx = d

Lagrangian is

## L(x, ) = (1/2)kAx bk2 + T (Cx d)

= (1/2)xT AT Ax bT Ax + (1/2)bT b + T Cx T d

## optimality conditions are

xL = AT Ax AT b + C T = 0, L = Cx d = 0

AT A C T AT b
    
x
=
C 0 d

1 
AT A C T AT b
   
x
=
C 0 d

## Least-norm solutions of undetermined equations 814

if AT A is invertible, we can derive a more explicit (and complicated)
formula for x

## from first block equation we get

x = (AT A)1(AT b C T )

## substitute into Cx = d to get

C(AT A)1(AT b C T ) = d

so 1
= C(AT A)1C T C(AT A)1AT b d


## recover x from equation above (not pretty)

 
T 1 T T T 1 T 1 T 1 T

x = (A A) A bC C(A A) C C(A A) A bd

## Least-norm solutions of undetermined equations 815

EE263 Autumn 2008-09 Stephen Boyd

Lecture 9
Autonomous linear dynamical systems

examples

91

x = Ax

## A is the dynamics matrix

(system is time-invariant if A doesnt depend on t)

## Autonomous linear dynamical systems 92

picture (phase plane):
x2
x(t)
= Ax(t)
x(t)

x1

## Autonomous linear dynamical systems 93

 
1 0
example 1: x = x
2 1
2

1.5

0.5

0.5

1.5

2
2 1.5 1 0.5 0 0.5 1 1.5 2

## Autonomous linear dynamical systems 94

 
0.5 1
example 2: x = x
1 0.5
2

1.5

0.5

0.5

1.5

2
2 1.5 1 0.5 0 0.5 1 1.5 2

Block diagram

n n
1/s
x(t)
x(t)
A

## Autonomous linear dynamical systems 96

useful when A has structure, e.g., block upper triangular:
 
A11 A12
x = x
0 A22

1/s
x1
A11

A12

1/s x2

A22

Linear circuit
ic1 il1
vc1 C1 L1 vl1

vcp Cp Lr vlr

## circuit equations are

   
dvc dil ic vc
C = ic, L = vl, =F
dt dt vl il

## Autonomous linear dynamical systems 98

 
vc
with state x = , we have
il
 
C 1 0
x = Fx
0 L1

## Autonomous linear dynamical systems 99

Chemical reactions

## linear model of reaction kinetics

dxi
= ai1x1 + + ainxn
dt

## Autonomous linear dynamical systems 910

1 2 k k
Example: series reaction A B C with linear dynamics

k1 0 0
x = k1 k2 0 x
0 k2 0

## plot for k1 = k2 = 1, initial x(0) = (1, 0, 0)

1

0.9

0.8

0.7
x1 x3
0.6
x(t)

0.5

0.4

0.3

0.2 x2
0.1

0
0 1 2 3 4 5 6 7 8 9 10

t
Autonomous linear dynamical systems 911

Prob(z(t) = 1)
p(t) = ..
Prob(z(t) = n)

## Autonomous linear dynamical systems 912

P is often sparse; Markov chain is depicted graphically

## Autonomous linear dynamical systems 913

example:
0.9

1.0 1 0.1

3 0.7 2
0.1
0.2

state 1 is system OK
state 2 is system down
state 3 is system being repaired

0.9 0.7 1.0
p(t + 1) = 0.1 0.1 0 p(t)
0 0.2 0

## Autonomous linear dynamical systems 914

Numerical integration of continuous system

= (I + hA)x(t)

## by carrying out this recursion (discrete-time LDS), starting at x(0) = x0,

we get approximation

## where x(m) denotes mth derivative

x
x(1)
define new variable z = .. Rnk , so

x(k1)

0 I 0 0
x(1)

0 0 I 0
.
z = . = .. ..
z

x(k) 0 0 0 I
A0 A1 A2 Ak1

## Autonomous linear dynamical systems 916

block diagram:

(k1) (k2)
x(k) 1/s x 1/s x 1/s x

Ak1 Ak2 A0

## Autonomous linear dynamical systems 917

Mechanical systems

## mechanical system with k degrees of freedom undergoing small motions:

M q + Dq + Kq = 0

## q(t) Rk is the vector of generalized displacements

M is the mass matrix
K is the stiffness matrix
D is the damping matrix
 
q
with state x = we have
q
   
q 0 I
x = = x
q M 1K M 1D

## Autonomous linear dynamical systems 918

Linearization near equilibrium point

x = f (x)

where f : Rn Rn

## now suppose x(t) is near xe, so

x(t)
= f (x(t)) f (xe) + Df (xe)(x(t) xe)

x(t) Df (xe)x(t)

## = Df (xe)x is a good approximation of x xe

we hope solution of x

(more later)

## Autonomous linear dynamical systems 920

example: pendulum

m
mg

## 2nd order nonlinear DE ml2 = lmg sin

 

rewrite as first order DE with state x = :

 
x2
x =
(g/l) sin x1

 
= 0 1
x x
g/l 0

## Autonomous linear dynamical systems 922

Does linearization work ?

the linearized system usually, but not always, gives a good idea of the
system behavior near xe

example 1: x = x3 near xe = 0
1/2
for x(0) > 0 solutions have form x(t) = x(0)2 + 2t
= 0; solutions are constant
linearized system is x

example 2: z = z 3 near ze = 0
1/2
for z(0) > 0 solutions have form z(t) = z(0)2 2t

## (finite escape time at t = z(0)2/2)

= 0; solutions are constant
linearized system is z

0.5

0.45

0.4 z(t)
0.35

0.3

0.25

0.2

0.15

## 0.1 x(t) = z(t)

0.05 x(t)
0
0 10 20 30 40 50 60 70 80 90 100

## Autonomous linear dynamical systems 924

Linearization along trajectory

## suppose x(t) is another trajectory, i.e., x(t)

= f (x(t), t), and is near
xtraj(t)

then

d
(x xtraj) = f (x, t) f (xtraj, t) Dxf (xtraj, t)(x xtraj)
dt

(time-varying) LDS
= Dxf (xtraj, t)x
x
is called linearized or variational system along trajectory xtraj

## x traj(t) = f (xtraj(t)), xtraj(t + T ) = xtraj(t)

linearized system is
= A(t)x
x
where A(t) = Df (xtraj(t))

used to study:

## Autonomous linear dynamical systems 926

EE263 Autumn 2008-09 Stephen Boyd

Lecture 10
Solution via Laplace transform and matrix
exponential

Laplace transform

## state transition matrix

matrix exponential

101

## Laplace transform of matrix valued function

suppose z : R+ Rpq

## Laplace transform: Z = L(z), where Z : D C Cpq is defined by

Z
Z(s) = estz(t) dt
0

## D is the domain or region of convergence of Z

D includes at least {s | s > a}, where a satisfies |zij (t)| eat for
t 0, i = 1, . . . , p, j = 1, . . . , q

## Solution via Laplace transform and matrix exponential 102

Derivative property

L(z)
= sZ(s) z(0)

## to derive, integrate by parts:

Z
L(z)(s)
= estz(t)
dt
0
t
Z
= estz(t) t=0 + s estz(t) dt
0
= sZ(s) z(0)

x = Ax

## take inverse transform


x(t) = L1 (sI A)1 x(0)

## Solution via Laplace transform and matrix exponential 104

Resolvent and state transition matrix

## resolvent defined for s C except eigenvalues of A, i.e., s such that

det(sI A) = 0

(t) = L1 (sI A)1 is called the state-transition matrix; it maps
the initial state to the state at time t:

x(t) = (t)x(0)

## Example 1: Harmonic oscillator

 
0 1
x = x
1 0

1.5

0.5

0.5

1.5

2
2 1.5 1 0.5 0 0.5 1 1.5 2

## Solution via Laplace transform and matrix exponential 106

 
s 1
sI A = , so resolvent is
1 s

 s 1 
(sI A)1 = s2 +1 s2 +1
1 s
s2 +1 s2 +1

(eigenvalues are j)

## state transition matrix is

 s 1   
1 s2 +1 s2 +1 cos t sin t
(t) = L 1 s =
s2 +1 s2 +1
sin t cos t

## a rotation matrix (t radians)

 
cos t sin t
so we have x(t) = x(0)
sin t cos t

## Example 2: Double integrator

 
0 1
x = x
0 0

1.5

0.5

0.5

1.5

2
2 1.5 1 0.5 0 0.5 1 1.5 2

## Solution via Laplace transform and matrix exponential 108

 
s 1
sI A = , so resolvent is
0 s

1 1
 
(sI A) 1
= s s2
1
0 s

(eigenvalues are 0, 0)

## state transition matrix is

1 1
   
s s2
1 t
(t) = L1 1 =
0 s
0 1

 
1 t
so we have x(t) = x(0)
0 1

## Solution via Laplace transform and matrix exponential 109

Characteristic polynomial

conjugate pairs

## Solution via Laplace transform and matrix exponential 1010

Eigenvalues of A and poles of resolvent

det ij
(1)i+j
det(sI A)

## det ij is a polynomial of degree less than n, so i, j entry of resolvent

has form fij (s)/X (s) where fij is polynomial with degree less than n

## but not all eigenvalues of A show up as poles of each entry

(when there are cancellations between det ij and X (s))

## Solution via Laplace transform and matrix exponential 1011

Matrix exponential

## series expansion of resolvent:

1 1 I A A2
(sI A) = (1/s)(I A/s) = + 2 + 3 +
s s s

## (valid for |s| large enough) so

 (tA)2
(t) = L1 (sI A)1 = I + tA + +
2!

## Solution via Laplace transform and matrix exponential 1012

looks like ordinary power series

at (ta)2
e = 1 + ta + +
2!

M M2
e =I +M + +
2!



x(t) = etAx(0)

x(t) = etax(0)

## Solution via Laplace transform and matrix exponential 1014

matrix exponential is meant to look like scalar exponential

some things youd guess hold for the matrix exponential (by analogy
with the scalar exponential) do in fact hold

## but many things youd guess are wrong

example: you might guess that eA+B = eAeB , but its false (in general)

   
0 1 0 1
A= , B=
1 0 0 0
   
A 0.54 0.84 B 1 1
e = , e =
0.84 0.54 0 1
   
0.16 1.40 0.54 1.38
eA+B = A B
6= e e =
0.70 0.16 0.84 0.30

## thus for t, s R, e(tA+sA) = etAesA

with s = t we get

etAetA = etAtA = e0 = I

1
etA = etA

## Solution via Laplace transform and matrix exponential 1016

 
0 1
example: lets find eA, where A =
0 0

we already found
 
1 t
etA = L1(sI A)1 =
0 1
 
1 1
so, plugging in t = 1, we get eA =
0 1

## lets check power series:

A A2
e =I +A+ + = I + A
2!

since A2 = A3 = = 0

## Time transfer property

for x = Ax we know

time t

## more generally we have, for any t and ,

x( + t) = etAx( )

## interpretation: the matrix etA propagates state t seconds forward in time

(backward if t < 0)

## Solution via Laplace transform and matrix exponential 1018

recall first order (forward Euler) approximate state update, for small t:

x( + t) x( ) + tx(
) = (I + tA)x( )

exact solution is

suppose x = Ax

## for uniform sampling tk+1 tk = h, so

z(k + 1) = ehAz(k),

## Solution via Laplace transform and matrix exponential 1020

Piecewise constant system

## consider time-varying LDS x = A(t)x, with

A0 0 t < t1
A(t) = A1 t1 t < t2
.
.

## x(t) = e(tti)Ai e(t3t2)A2 e(t2t1)A1 et1A0 x(0)

(matrix on righthand side is called state transition matrix for system, and
denoted (t))

ai(s)
Xi(s) =
X (s)

## where ai is a polynomial of degree < n

thus the poles of Xi are all eigenvalues of A (but not necessarily the other
way around)

## Solution via Laplace transform and matrix exponential 1022

first assume eigenvalues i are distinct, so Xi(s) cannot have repeated
poles

## then xi(t) has form

n
X
xi(t) = ij ej t
j=1
where ij depend on x(0) (linearly)

## eigenvalues give exponents that can occur in exponentials

real eigenvalue corresponds to an exponentially decaying or growing
term et in solution
complex eigenvalue = + j corresponds to decaying or growing
sinusoidal term et cos(t + ) in solution

## Solution via Laplace transform and matrix exponential 1023

j gives exponential growth rate (if > 0), or exponential decay rate (if
< 0) of term

eigenvalues s

## Solution via Laplace transform and matrix exponential 1024

now suppose A has repeated eigenvalues, so Xi can have repeated poles

## express eigenvalues as 1, . . . , r (distinct) with multiplicities n1, . . . , nr ,

respectively (n1 + + nr = n)

## then xi(t) has form

r
X
xi(t) = pij (t)ej t
j=1
where pij (t) is a polynomial of degree < nj (that depends linearly on x(0))

Stability

meaning:

## fact: x = Ax is stable if and only if all eigenvalues of A have negative real

part:
i < 0, i = 1, . . . , n

## Solution via Laplace transform and matrix exponential 1026

the if part is clear since

lim p(t)et = 0
t

## more generally, maxi i determines the maximum asymptotic logarithmic

growth rate of x(t) (or decay, if < 0)

## Solution via Laplace transform and matrix exponential 1027

EE263 Autumn 2008-09 Stephen Boyd

Lecture 11
Eigenvectors and diagonalization
eigenvectors

## complex eigenvectors & invariant planes

left eigenvectors

diagonalization

modal form

discrete-time stability

111

## Eigenvectors and eigenvalues

C is an eigenvalue of A Cnn if

X () = det(I A) = 0

equivalent to:

Av = v

wT A = wT

## Eigenvectors and diagonalization 112

if v is an eigenvector of A with eigenvalue , then so is v, for any
C, 6= 0

## when A and are real, we can always find a real eigenvector v

associated with : if Av = v, with A Rnn, R, and v Cn,
then
Av = v, Av = v
so v and v are real eigenvectors, if they are nonzero
(and at least one is)

## conjugate symmetry : if A is real and v Cn is an eigenvector

associated with C, then v is an eigenvector associated with :
taking conjugate of Av = v we get Av = v, so

Av = v

## Eigenvectors and diagonalization 113

Scaling interpretation

Ax
v
x

Av

(what is here?)

## Eigenvectors and diagonalization 114

R, > 0: v and Av point in same direction

## R, || > 1: Av larger than v

(well see later how this relates to stability of continuous- and discrete-time
systems. . . )

## Eigenvectors and diagonalization 115

Dynamic interpretation

suppose Av = v, v 6= 0

## several ways to see this, e.g.,

(tA)2
 
x(t) = etAv = I + tA + + v
2!
(t)2
= v + tv + v +
2!
= etv

## Eigenvectors and diagonalization 116

for C, solution is complex (well interpret later); for now, assume
R

## if initial state is an eigenvector v, resulting motion is very simple

always on the line spanned by v

eigenvalue )

Invariant sets

x( ) S for all t

trajectory

## Eigenvectors and diagonalization 118

suppose Av = v, v 6= 0, R

line { tv | t R } is invariant
(in fact, ray { tv | t > 0 } is invariant)

## Eigenvectors and diagonalization 119

Complex eigenvectors
suppose Av = v, v 6= 0, is complex
for a C, (complex) trajectory aetv satisfies x = Ax
hence so does (real) trajectory

x(t) = aetv

  
cos t sin t
= et vre vim
 
sin t cos t

where
v = vre + jvim, = + j, a = + j

## trajectory stays in invariant plane span{vre, vim}

gives logarithmic growth/decay factor
gives angular velocity of rotation in plane

## Eigenvectors and diagonalization 1110

Dynamic interpretation: left eigenvectors

suppose wT A = wT , w 6= 0

then
d T
(w x) = wT x = wT Ax = (wT x)
dt
i.e., w x satisfies the DE d(wT x)/dt = (wT x)
T

## even if trajectory x is complicated, wT x is simple

if, e.g., R, < 0, halfspace { z | wT z a } is invariant (for a 0)
for = + j C, (w)T x and (w)T x both have form

et ( cos(t) + sin(t))

Summary

## right eigenvectors are initial conditions from which resulting motion is

simple (i.e., remains on line or in plane)

left eigenvectors give linear functions of state that are simple, for any
initial condition

## Eigenvectors and diagonalization 1112

1 10 10
example 1: x = 1 0 0 x
0 1 0

block diagram:

x1 x2 x3
1/s 1/s 1/s
1 10 10

## X (s) = s3 + s2 + 10s + 10 = (s + 1)(s2 + 10)

eigenvalues are 1, j 10

## trajectory with x(0) = (0, 1, 1):

1
x1

1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.5
t
0
x2

0.5

1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
t
0.5
x3

0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t

## Eigenvectors and diagonalization 1114

left eigenvector asssociated with eigenvalue 1 is

0.1
g= 0
1

## lets check g T x(t) when x(0) = (0, 1, 1) (as above):

1

0.9

0.8

0.7

0.6
gT x

0.5

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

t
Eigenvectors and diagonalization 1115

eigenvector associated with eigenvalue j 10 is

0.554 + j0.771
v = 0.244 + j0.175
0.055 j0.077

## so an invariant plane is spanned by

0.554 0.771
vre = 0.244 , vim = 0.175
0.055 0.077

## Eigenvectors and diagonalization 1116

for example, with x(0) = vre we have
1

x1
0

1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.5
t
x2
0

0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.1
t
x3

0.1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t

## Example 2: Markov chain

probability distribution satisfies p(t + 1) = P p(t)
Pn
pi(t) = Prob( z(t) = i ) so i=1 pi(t) = 1
Pn
Pij = Prob( z(t + 1) = i | z(t) = j ), so i=1 Pij = 1
(such matrices are called stochastic)
rewrite as:
[1 1 1]P = [1 1 1]
i.e., [1 1 1] is a left eigenvector of P with e.v. 1
hence det(I P ) = 0, so there is a right eigenvector v 6= 0 with P v = v
it can be shown thatPv can be chosen so that vi 0, hence we can
n
normalize v so that i=1 vi = 1
interpretation: v is an equilibrium distribution; i.e., if p(0) = v then
p(t) = v for all t 0
(if v is unique it is called the steady-state distribution of the Markov chain)

Diagonalization

## suppose v1, . . . , vn is a linearly independent set of eigenvectors of

A Rnn:
Avi = ivi, i = 1, . . . , n

express as

1
A

v1 vn

=

v1 vn

...
n

 
define T = v1 vn and = diag(1, . . . , n), so

AT = T

and finally
T 1AT =

## T invertible since v1, . . . , vn linearly independent

similarity transformation by T diagonalizes A

## conversely if there is a T = [v1 vn] s.t.

T 1AT = = diag(1, . . . , n)

then AT = T , i.e.,

Avi = ivi, i = 1, . . . , n

## so v1, . . . , vn is a linearly independent set of eigenvectors of A

we say A is diagonalizable if

## there exists T s.t. T 1AT = is diagonal

A has a set of linearly independent eigenvectors

## Eigenvectors and diagonalization 1120

Not all matrices are diagonalizable

 
0 1
example: A =
0 0

## eigenvectors satisfy Av = 0v = 0, i.e.

  
0 1 v1
=0
0 0 v2

 
v1
so all eigenvectors have form v = where v1 6= 0
0

## Eigenvectors and diagonalization 1121

Distinct eigenvalues

diagonalizable

diagonalizable)

## Eigenvectors and diagonalization 1122

Diagonalization and left eigenvectors

rewrite T 1AT = as T 1A = T 1, or

w1T
T
w1
.
. A = ..
wnT wnT

## where w1T , . . . , wnT are the rows of T 1

thus
wiT A = iwiT

i.e., the rows of T 1 are (lin. indep.) left eigenvectors, normalized so that

wiT vj = ij

(i.e., left & right eigenvectors chosen this way are dual bases)

## Eigenvectors and diagonalization 1123

Modal form

suppose A is diagonalizable by T

, so

= AT x
Tx = T 1AT x
x =
x x

## Eigenvectors and diagonalization 1124

in new coordinate system, system is diagonal (decoupled):

x
1
1/s

x
n
1/s

i(t) = eitx
x i(0)

## Real modal form

when eigenvalues (hence T ) are complex, system can be put in real modal
form:

## where r = diag(1, . . . , r ) are the real eigenvalues, and

 
i i
Mi = , i = i + ji, i = r + 1, r + 3, . . . , n
i i

where i are the complex eigenvalues (one from each conjugate pair)

## Eigenvectors and diagonalization 1126

block diagram of complex mode:

1/s

1/s

## diagonalization simplifies many matrix expressions

e.g., resolvent:
1
(sI A)1 = sT T 1 T T 1
1
= T (sI )T 1
= T (sI )1T 1
 
1 1
= T diag ,..., T 1
s 1 s n

## powers (i.e., discrete-time solution):

k
Ak = T T 1
 
= T T 1 T T 1
= T k T 1
= T diag(k1 , . . . , kn)T 1

## Eigenvectors and diagonalization 1128

exponential (i.e., continuous-time solution):

eA = I + A + A2/2! +
2
= I + T T 1 + T T 1 /2! +
= T (I + + 2/2! + )T 1
= T eT 1
= T diag(e1 , . . . , en )T 1

## f (A) = 0I + 1A + 2A2 + 3A3 +

substituting A = T T 1, we have

## f (A) = 0I + 1A + 2A2 + 3A3 +

= 0T T 1 + 1T T 1 + 2(T T 1)2 +
= T 0I + 1 + 22 + T 1


## Eigenvectors and diagonalization 1130

Solution via diagonalization

assume A is diagonalizable

## consider LDS x = Ax, with T 1AT =

then

x(t) = etAx(0)
= T etT 1x(0)
Xn
= eit(wiT x(0))vi
i=1

interpretation:

wiT x(0)

## Eigenvectors and diagonalization 1132

application: for what x(0) do we have x(t) 0 as t ?
divide eigenvalues into those with negative real parts

1 < 0, . . . , s < 0,

## and the others,

s+1 0, . . . , n 0

from n
X
x(t) = eit(wiT x(0))vi
i=1
condition for x(t) 0 is:

## x(0) span{v1, . . . , vs},

or equivalently,
wiT x(0) = 0, i = s + 1, . . . , n

## Stability of discrete-time systems

suppose A diagonalizable
consider discrete-time LDS x(t + 1) = Ax(t)
if A = T T 1, then Ak = T k T 1
then n
X
t
x(t) = A x(0) = ti(wiT x(0))vi 0 as t
i=1
for all x(0) if and only if

|i| < 1, i = 1, . . . , n.

we will see later that this is true even when A is not diagonalizable, so we
have
fact: x(t + 1) = Ax(t) is stable if and only if all eigenvalues of A have
magnitude less than one

## Eigenvectors and diagonalization 1134

EE263 Autumn 2008-09 Stephen Boyd

Lecture 12
Jordan canonical form

## Jordan canonical form

generalized modes

Cayley-Hamilton theorem

121

## any matrix A Rnn can be put in Jordan canonical form by a similarity

transformation, i.e.

J1
T 1AT = J = ...
Jq

where
i 1
i ...
Ji = Cnini
... 1
i
Pq
is called a Jordan block of size ni with eigenvalue i (so n = i=1 ni )

## Jordan canonical form 122

J is upper bidiagonal

## dim N (I A) is the number of Jordan blocks with eigenvalue

more generally,
X
dim N (I A)k = min{k, ni}
i =

## so from dim N (I A)k for k = 1, 2, . . . we can determine the sizes of

the Jordan blocks associated with

## Jordan canonical form 124

factor out T and T 1, I A = T (I J)T 1

## for, say, a block of size 3:

0 1 0 0 0 1
iIJi = 0 0 1 (iIJi)2 = 0 0 0 (iIJi)3 = 0
0 0 0 0 0 0

## for other blocks (say, size 3, for k 2)

(i j )k k(i j )k1 (k(k 1)/2)(i j )k2
(iIJj )k = 0 (j i)k k(j i)k1
k
0 0 (j i)

## Jordan canonical form 125

Generalized eigenvectors

## suppose T 1AT = J = diag(J1, . . . , Jq )

express T as
T = [T1 T2 Tq ]
where Ti Cnni are the columns of T associated with ith Jordan block Ji
we have ATi = TiJi
let Ti = [vi1 vi2 vini ]
then we have:
Avi1 = ivi1,
i.e., the first column of each Ti is an eigenvector associated with e.v. i
for j = 2, . . . , ni,
Avij = vi j1 + ivij

## Jordan canonical form 126

Jordan form LDS

consider LDS x = Ax

by change of coordinates x = T x = J x
, can put into form x

i = Jix
system is decomposed into independent Jordan block systems x i

x
ni x
ni1 x
1
1/s 1/s 1/s

## resolvent of k k Jordan block with eigenvalue :

1
s 1
s ...
(sI J)1 = ...

1
s

(s )1 (s )2 (s )k
(s )1 (s )k+1
=

... ..

(s ) 1

## Jordan canonical form 128

by inverse Laplace transform, exponential is:

etJ = et I + tF1 + + (tk1/(k 1)!)Fk1

1 t tk1/(k 1)!
1 tk2/(k 2)!
= et
... ..

1

## Jordan canonical form 129

Generalized modes

(0) = TieJita

## Jordan canonical form 1210

with general x(0) we can write

q
X
tA tJ
x(t) = e x(0) = T e T 1
x(0) = TietJi (SiT x(0))
i=1

where
S1T
T 1 = ..
SqT

modes

## Jordan canonical form 1211

Cayley-Hamilton theorem

## Cayley-Hamilton theorem: for any A Rnn we have X (A) = 0, where

X (s) = det(sI A)
 
1 2
example: with A = we have X (s) = s2 5s 2, so
3 4

X (A) = A2 5A 2I
   
7 10 1 2
= 5 2I
15 22 3 4
= 0

## Jordan canonical form 1212

corollary: for every p Z+, we have

Ap span I, A, A2, . . . , An1

## i.e., every power of A can be expressed as linear combination of

I, A, . . . , An1

then

## X (A) = An + an1An1 + + a0I = 0

as 
I = A (a1/a0)I (a2/a0)A (1/a0)An1
(A is invertible a0 6= 0) so

## Jordan canonical form 1214

Proof of C-H theorem

## first assume A is diagonalizable: T 1AT =

X (s) = (s 1) (s n)

since
X (A) = X (T T 1) = T X ()T 1
it suffices to show X () = 0

X () = ( 1I) ( nI)
= diag(0, 2 1, . . . , n 1) diag(1 n, . . . , n1 n, 0)
= 0

## suffices to show X (Ji) = 0

ni
0 1 0
X (Ji) = (Ji 1I)n1 0 0 1 (Ji q I)nq = 0
...
| {z }
(Ji i I)ni

## Jordan canonical form 1216

EE263 Autumn 2008-09 Stephen Boyd

Lecture 13
Linear dynamical systems with inputs &
outputs

transfer matrix

examples

131

## Inputs & outputs

recall continuous-time time-invariant LDS has form

x = Ax + Bu, y = Cx + Du

## Ax is called the drift term (of x)

Bu is called the input term (of x)

## picture, with B R21:

x(t)
(with u(t) = 1)

x(t)
(with u(t) = 1.5) Ax(t) x(t)
B

Interpretations

## write x = Ax + b1u1 + + bmum, where B = [b1 bm]

state derivative is sum of autonomous term (Ax) and one term per
input (biui)
each input ui gives another degree of freedom for x (assuming columns
of B independent)

Ti x + bTi u, where a
write x = Ax + Bu as x i = a Ti , bTi are the rows of A, B

## Linear dynamical systems with inputs & outputs 133

Block diagram

x(t)
x(t)
u(t) B 1/s C y(t)

## Linear dynamical systems with inputs & outputs 134

interesting when there is structure, e.g., with x1 Rn1 , x2 Rn2 :
        
d x1 A11 A12 x1 B1   x1
= + u, y = C1 C2
dt x2 0 A22 x2 0 x2

u B1 1/s y
x1 C1

A11

A12 x2

1/s C2

A22

## x2 is not affected by input u, i.e., x2 propagates autonomously

x2 affects y directly and through x1

## Linear dynamical systems with inputs & outputs 135

Transfer matrix
take Laplace transform of x = Ax + Bu:

## sX(s) x(0) = AX(s) + BU (s)

hence
X(s) = (sI A)1x(0) + (sI A)1BU (s)

so Z t
tA
x(t) = e x(0) + e(t )ABu( ) d
0

function

## Linear dynamical systems with inputs & outputs 136

with y = Cx + Du we have:

## Y (s) = C(sI A)1x(0) + (C(sI A)1B + D)U (s)

so Z t
tA
y(t) = Ce x(0) + Ce(t )ABu( ) d + Du(t)
0

matrix

## h(t) = CetAB + D(t) is called the impulse matrix or impulse response

( is the Dirac delta function)

intepretation:

## Linear dynamical systems with inputs & outputs 138

Impulse matrix
impulse matrix h(t) = CetAB + D(t)
with x(0) = 0, y = h u, i.e.,
m Z
X t
yi(t) = hij (t )uj ( ) d
j=1 0

interpretations:

seconds ago

Step matrix

Z t
s(t) = h( ) d
0

interpretations:



Example 1

u1 u1 u2 u2

 
y
x=
y

## Linear dynamical systems with inputs & outputs 1311

system is:

0 0 0 1 0 0 0 0

0 0 0 0 1 0

0 0
 
0 0 0 0 0 1 0 0 u1
x = x +

2 1 0 2 1 0

1 0 u2

1 2 1 1 2 1 1 1
0 1 2 0 1 2 0 1

eigenvalues of A are

## Linear dynamical systems with inputs & outputs 1312

impulse matrix:
impulse response from u1
0.2

0.1 h11
0

0.1
h31
0.2 h21
0 5 10 15
t
impulse response from u2
0.2 h22
0.1 h12
0

0.1
h32
0.2

0 5 10 15
t

roughly speaking:

## impulse at u1 affects third mass less than other two

impulse at u2 affects first mass later than other two

## Linear dynamical systems with inputs & outputs 1313

Example 2

interconnect circuit:

C3 C4
u

C1 C2

## u(t) R is input (drive) voltage

xi is voltage across Ci
output is state: y = x
unit resistors, unit capacitors
step response matrix shows delay to each node

## Linear dynamical systems with inputs & outputs 1314

system is

3 1 1 0 1
1 1 0 0 0
x =
1
x + u, y=x
0 2 1 0
0 0 1 1 0

eigenvalues of A are

1

0.9

0.8

0.7
s1
0.6
s2
0.5
s3
0.4 s4
0.3

0.2

0.1

0
0 5 10 15

## Linear dynamical systems with inputs & outputs 1316

DC or static gain matrix

## DC transfer matrix describes system under static conditions, i.e., x, u,

y constant:
0 = x = Ax + Bu, y = Cx + Du
eliminate x to get y = H(0)u

if system is stable,
Z
H(0) = h(t) dt = lim s(t)
0 t

Z Z t
st
(recall: H(s) = e h(t) dt, s(t) = h( ) d )
0 0
m p
if u(t) u R , then y(t) y R where y = H(0)u

1/4 1/4
H(0) = 1/2 1/2
1/4 1/4

1
1
H(0) =
1
1

## Linear dynamical systems with inputs & outputs 1318

Discretization with piecewise constant inputs
linear system x = Ax + Bu, y = Cx + Du
suppose ud : Z+ Rm is a sequence, and

define sequences

## xd(k) = x(kh), yd(k) = y(kh), k = 0, 1, . . .

h > 0 is called the sample interval (for x and y) or update interval (for
u)

## xd(k + 1) = x((k + 1)h)

Z h
hA
= e e ABu((k + 1)h ) d
x(kh) +
0
Z h !
= ehAxd(k) + e A d B ud(k)
0

## xd(k + 1) = Adxd(k) + Bdud(k), yd(k) = Cdxd(k) + Ddud(k)

where
!
Z h
Ad = ehA, Bd = e A d B, Cd = C, Dd = D
0

## Linear dynamical systems with inputs & outputs 1320

called discretized system

Z h
e A d = A1 ehA I

0

## stability: if eigenvalues of A are 1, . . . , n, then eigenvalues of Ad are

eh1 , . . . , ehn

h
i < 0 e i < 1

for h > 0

## Linear dynamical systems with inputs & outputs 1321

extensions/variations:

## multirate: ui updated, yi sampled at different intervals

(usually integer multiples of a common interval h)

Dual system

## the dual system associated with system

x = Ax + Bu, y = Cx + Du

is given by
z = AT z + C T v, w = B T z + DT v

## all matrices are transposed

role of B and C are swapped

## Linear dynamical systems with inputs & outputs 1324

Dual via block diagram

## Linear dynamical systems with inputs & outputs 1325

original system:

x(t)
u(t) B 1/s C y(t)

dual system:

DT

z(t)
w(t) BT 1/s CT v(t)

AT

## Linear dynamical systems with inputs & outputs 1326

Causality

interpretation of
Z t
tA
x(t) = e x(0) + e(t )ABu( ) d
0
Z t
tA
y(t) = Ce x(0) + Ce(t )ABu( ) d + Du(t)
0

for t 0:

current state (x(t)) and output (y(t)) depend on past input (u( ) for
t)

i.e., mapping from input to state and output is causal (with fixed initial
state)

## now consider fixed final state x(T ): for t T ,

Z t
(tT )A
x(t) = e x(T ) + e(t )ABu( ) d,
T

Idea of state

## Linear dynamical systems with inputs & outputs 1329

Change of coordinates
start with LDS x = Ax + Bu, y = Cx + Du
change coordinates in Rn to x
, with x = T x

then
= T 1x = T 1(Ax + Bu) = T 1AT x
x + T 1Bu

## hence LDS can be expressed as

x + Bu,
= A
x y = C x
+ Du

where

A = T 1AT, = T 1B,
B C = CT, =D
D

C(sI 1B
A) +D
= C(sI A)1B + D

## Linear dynamical systems with inputs & outputs 1330

Standard forms for LDS

Jordan . . . )

## e.g., to put LDS in diagonal form, find T s.t.

T 1AT = diag(1, . . . , n)

write
bT

1
T 1B = .. ,
 
CT = c1 cn
bT
n
so n
i + bTi u,
X
i = ix
x y= cix
i
i=1

## Linear dynamical systems with inputs & outputs 1331

bT x
1
1 1/s c1

1
u y

bT x
n
n 1/s cn

(here we assume D = 0)

## Linear dynamical systems with inputs & outputs 1332

Discrete-time systems
discrete-time LDS:

## x(t + 1) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)

x(t + 1) x(t)
u(t) B 1/z C y(t)

## only difference w/cts-time: z instead of s

interpretation of z 1 block:
unit delayor (shifts sequence back in time one epoch)
latch (plus small delay to avoid race condition)

## Linear dynamical systems with inputs & outputs 1333

we have:
x(1) = Ax(0) + Bu(0),

## x(2) = Ax(1) + Bu(1)

= A2x(0) + ABu(0) + Bu(1),

## and in general, for t Z+,

t1
X
t
x(t) = A x(0) + A(t1 )Bu( )
=0

hence
y(t) = CAtx(0) + h u

## Linear dynamical systems with inputs & outputs 1334

where is discrete-time convolution and

D, t=0
h(t) = t1
CA B, t > 0

Z-transform

w : Z+ Rpq

X
W (z) = z tw(t)
t=0

## Linear dynamical systems with inputs & outputs 1336

Z-transform of time-advanced signal:

X
V (z) = z tw(t + 1)
t=0

X
= z z tw(t)
t=1
= zW (z) zw(0)

yields

## Linear dynamical systems with inputs & outputs 1338

hence
Y (z) = H(z)U (z) + C(zI A)1zx(0)
where H(z) = C(zI A)1B + D is the discrete-time transfer function

## Linear dynamical systems with inputs & outputs 1339

EE263 Autumn 2008-09 Stephen Boyd

Lecture 14
Example: Aircraft dynamics

## wind gust & control inputs

linearized dynamics

steady-state analysis

impulse matrices

141

body axis

horizontal

## variables are (small) deviations from operating point or trim conditions

state (components):

## v: velocity of aircraft perpendicular to body axis

(down is positive)

(up is positive)

q = :

## Example: Aircraft dynamics 142

Inputs

disturbance inputs:

t: thrust

## Example: Aircraft dynamics 143

Linearized dynamics

## for 747, level flight, 40000 ft, 774 ft/sec,

u .003 .039 0 .322 u uw
v
= .065 .319 7.74 0
v vw

q .020 .101 .429 0 q
0 0 1 0

.01 1  
.18 .04 e
+
1.16 .598 t
0 0

## Example: Aircraft dynamics 144

outputs of interest:

## Example: Aircraft dynamics 145

Steady-state analysis

DC gain from (uw , vw , e, t) to (u, h):
 
1 0 27.2 15.0
H(0) = CA 1
B+D =
0 1 1.34 24.9

gives steady-state change in speed & climb rate due to wind, elevator &
thrust changes

solve for control variables in terms of wind velocities, desired speed &
climb rate

    
e .0379 .0229 u uw
=
t .0020 .0413 h + vw

## Example: Aircraft dynamics 146

level flight, increase in speed is obtained mostly by increasing elevator
(i.e., downwards)

## constant speed, increase in climb rate is obtained by increasing thrust

and increasing elevator (i.e., downwards)

eigenvalues are

eigenvectors are

0.0005 0.0135
j 0.8235 ,
0.5433
xshort =
0.0899 0.0677
0.0283 0.1140

0.7510 0.6130
j 0.0941
0.0962
xphug =
0.0111 0.0082
0.1225 0.1637

## Example: Aircraft dynamics 149

Short-period mode

## y(t) = CetA(xshort) (pure short-period mode motion)

1

0.5
u(t)

0.5

1
0 2 4 6 8 10 12 14 16 18 20

0.5
h(t)

0.5

1
0 2 4 6 8 10 12 14 16 18 20

## only small effect on speed u

period 7 sec, decays in 10 sec

Phugoid mode

## y(t) = CetA(xphug ) (pure phugoid mode motion)

2

u(t)
0

2
0 200 400 600 800 1000 1200 1400 1600 1800 2000

1
h(t)

2
0 200 400 600 800 1000 1200 1400 1600 1800 2000

## affects both speed and climb rate

period 100 sec; decays in 5000 sec

## (gives response to short

impulse response matrix from (uw , vw ) to (u, h)
wind bursts)

## over time period [0, 20]:

0.1 0.1
h12
h11

0 0

0.1 0.1
0 5 10 15 20 0 5 10 15 20

0.5 0.5
h22
h21

0 0

0.5 0.5
0 5 10 15 20 0 5 10 15 20

## Example: Aircraft dynamics 1412

over time period [0, 600]:
0.1 0.1

h12
h11
0 0

0.1 0.1
0 200 400 600 0 200 400 600

0.5 0.5

h22
h21

0 0

0.5 0.5
0 200 400 600 0 200 400 600

## Dynamic response to actuators

impulse response matrix from (e, t) to (u, h)

## over time period [0, 20]:

2 2

1 1
h12
h11

0 0

1 1

2 2
0 5 10 15 20 0 5 10 15 20

5 3

2.5

2
h22
h21

0 1.5

0.5

5 0
0 5 10 15 20 0 5 10 15 20

## Example: Aircraft dynamics 1414

over time period [0, 600]:
2 2

1 1

h12
h11
0 0

1 1

2 2
0 200 400 600 0 200 400 600

3 3

2 2

1 1

h22
h21

0 0

1 1

2 2

3 3
0 200 400 600 0 200 400 600

## Example: Aircraft dynamics 1415

EE263 Autumn 2008-09 Stephen Boyd

Lecture 15
Symmetric matrices, quadratic forms, matrix
norm, and SVD
eigenvectors of symmetric matrices

quadratic forms

norm of a matrix

151

## to see this, suppose Av = v, v 6= 0, v Cn

then n
X
T T T
v Av = v (Av) = v v = |vi|2
i=1

but also n
T T T X
v Av = (Av) v = (v) v = |vi|2
i=1

## Symmetric matrices, quadratic forms, matrix norm, and SVD 152

Eigenvectors of symmetric matrices

## fact: there is a set of orthonormal eigenvectors of A, i.e., q1, . . . , qn s.t.

Aqi = iqi, qiT qj = ij

Q1AQ = QT AQ =

n
X
T
A = QQ = iqiqiT
i=1

replacements
Interpretations

A = QQT

x QT x QT x Ax
T Q
Q

## resolve into qi coordinates

scale coordinates by i

## Symmetric matrices, quadratic forms, matrix norm, and SVD 154

or, geometrically,

rotate by QT

## diagonal real scale (dilation) by

rotate back by Q

decomposition
n
X
A= iqiqiT
i=1
expresses A as linear combination of 1-dimensional projections

## Symmetric matrices, quadratic forms, matrix norm, and SVD 155

example:
 
1/2 3/2
A =
3/2 1/2
      T
1 1 1 1 0 1 1 1
=
2 1 1 0 2 2 1 1

q1
q2q2T x
x
q1q1T x 1q1q1T x
q2
2q2q2T x
Ax

## Symmetric matrices, quadratic forms, matrix norm, and SVD 156

proof (case of i distinct)

## since i distinct, can find v1, . . . , vn, a set of linearly independent

eigenvectors of A:
Avi = ivi, kvik = 1

then we have

## viT (Avj ) = j viT vj = (Avi)T vj = iviT vj

so (i j )viT vj = 0

## in this case we can say: eigenvectors are orthogonal

in general case (i not distinct) we must say: eigenvectors can be
chosen to be orthogonal

## Symmetric matrices, quadratic forms, matrix norm, and SVD 157

Example: RC circuit
i1
v1 c1

resistive circuit
in
vn cn

ck v k = ik , i = Gv

## Symmetric matrices, quadratic forms, matrix norm, and SVD 158

use state xi = civi, so

## x = C 1/2v = C 1/2GC 1/2x

where C 1/2 = diag( c1, . . . , cn)

we conclude:

Quadratic forms

## a function f : Rn R of the form

n
X
T
f (x) = x Ax = Aij xixj
i,j=1

## in a quadratic form we may as well assume A = AT since

xT Ax = xT ((A + AT )/2)x

A=B

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1510

Examples

kBxk2 = xT B T Bx
Pn1
i=1 (xi+1 xi)2

kF xk2 kGxk2

xT Ax = xT QQT x
= (QT x)T (QT x)
Xn
= i(qiT x)2
i=1
n
X
1 (qiT x)2
i=1

= 1kxk2

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1512

similar argument shows xT Ax nkxk2, so we have

nxT x xT Ax 1xT x

## Positive semidefinite and positive definite matrices

suppose A = AT Rnn

## denoted A 0 (and sometimes A  0)

A 0 if and only if min(A) 0, i.e., all eigenvalues are nonnegative
not the same as Aij 0 for all i, j

## we say A is positive definite if xT Ax > 0 for all x 6= 0

denoted A > 0
A > 0 if and only if min(A) > 0, i.e., all eigenvalues are positive

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1514

Matrix inequalities

if B A > 0, etc.

for example:

## many properties that youd guess hold actually do, e.g.,

if A B and C D, then A + C B + D
if B 0 then A + B A
if A 0 and 0, then A 0
A2 0
if A > 0, then A1 > 0

A 6 B, B 6 A

Ellipsoids

## if A = AT > 0, the set

E = { x | xT Ax 1 }

s1
s2

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1517

1/2
semi-axes are given by si = i qi, i.e.:

note:

## in direction qn, xT Ax is small, hence ellipsoid is fat in direction qn

p
max/min gives maximum eccentricity

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1518

Gain of a matrix in a direction

direction x

questions:

## what is maximum gain of A

(and corresponding maximum gain direction)?

## what is minimum gain of A

(and corresponding minimum gain direction)?

Matrix norm

kAxk
max
x6=0 kxk

## is called the matrix norm or spectral norm of A and is denoted kAk

kAxk2 xT AT Ax
max = max = max(AT A)
x6=0 kxk 2 x6=0 kxk 2

p
so we have kAk = max(AT A)

## similarly the minimum gain is given by

q
min kAxk/kxk = min(AT A)
x6=0

note that

with max

min

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1521

1 2
example: A = 3 4
5 6

 
35 44
AT A =
44 56
   T
0.620 0.785 90.7 0 0.620 0.785
=
0.785 0.620 0 0.265 0.785 0.620

p
then kAk = max(AT A) = 9.53:

    2.18
0.620
A 0.620 = 4.99 = 9.53

0.785 = 1,

0.785 7.78

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1522

p
min gain is min(AT A) = 0.514:

    0.46
0.785 0.785
= 0.14 = 0.514
0.620 = 1,
A
0.620 0.18

kAxk
0.514 9.53
kxk

## Properties of matrix norm

consistent
p with vector
norm: matrix norm of a Rn1 is
max(aT a) = aT a

## triangle inequality: kA + Bk kAk + kBk

definiteness: kAk = 0 A = 0

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1524

Singular value decomposition

## more complete picture of gain properties of A given by singular value

decomposition (SVD) of A:

A = U V T

where

A Rmn, Rank(A) = r

U Rmr , U T U = I

V Rnr , V T V = I

r
X
T
A = U V = iuiviT
i=1

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1526

AT A = (U V T )T (U V T ) = V 2V T

hence:

## vi are eigenvectors of AT A (corresponding to nonzero eigenvalues)

p
i = i(AT A) (and i(AT A) = 0 for i > r)

kAk = 1

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1527

similarly,
AAT = (U V T )(U V T )T = U 2U T

hence:

## ui are eigenvectors of AAT (corresponding to nonzero eigenvalues)

p
i = i(AAT ) (and i(AAT ) = 0 for i > r)

Interpretations

r
X
T
A = U V = iuiviT
i=1

x T
V Tx V T x Ax
V U

## compute coefficients of x along input directions v1, . . . , vr

scale coefficients by i
reconstitute along output directions u1, . . . , ur

## difference with eigenvalue decomposition for symmetric A: input and

output directions are different

Av1 = 1u1

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1530

SVD gives clearer picture of gain as function of input/output directions

## input components along directions v1 and v2 are amplified (by about

10) and come out mostly along plane spanned by u1, u2

10)

A is nonsingular

## example: A R22, with 1 = 1, 2 = 0.5

resolve x along v1, v2: v1T x = 0.5, v2T x = 0.6, i.e., x = 0.5v1 + 0.6v2

v2
u1

x Ax

v1 u2

## Symmetric matrices, quadratic forms, matrix norm, and SVD 1532

EE263 Autumn 2008-09 Stephen Boyd

Lecture 16
SVD Applications

general pseudo-inverse

full SVD

## image of unit ball under linear transformation

SVD in estimation/inversion

## low rank approximation via SVD

161

General pseudo-inverse

if A 6= 0 has SVD A = U V T ,

A = V 1U T

## is the pseudo-inverse or Moore-Penrose inverse of A

if A is skinny and full rank,

A = (AT A)1AT

## gives the least-squares approximate solution xls = Ay

if A is fat and full rank,

A = AT (AAT )1

in general case:

w

## xpinv = Ay Xls has minimum norm on Xls, i.e., xpinv is the

minimum-norm, least-squares approximate solution

## kAx yk2 + kxk2

i.e., 1
x = AT A + I AT y
here, AT A + I > 0 and so is invertible

## then we have lim x = Ay

0

1
in fact, we have lim AT A + I A T = A
0

(check this!)

Full SVD

## SVD of A Rmn with Rank(A) = r:

1 v1T
A = U11V1T = ... ..
 
u1 ur
r vrT

## find U2 Rm(mr), V2 Rn(nr) s.t. U = [U1 U2] Rmm and

V = [V1 V2] Rnn are orthogonal

## add zero rows/cols to 1 to form Rmn:

" #
1 0r(n r)
=
0(m r)r 0(m r)(n r)

## SVD Applications 165

then we have
" #
1 0r(n r) V1T

A = U11V1T =
 
U1 U2
0(m r)r 0(m r)(n r) V2T

i.e.:
A = U V T

## SVD Applications 166

Image of unit ball under linear transformation

full SVD:
A = U V T

rotate (by V T )

rotate (by U )

## Image of unit ball under A

1 1

1 rotate by V T 1

## stretch, = diag(2, 0.5)

u2 u1 rotate by U
0.5
2

## SVD Applications 168

SVD in estimation/inversion

suppose y = Ax + v, where

y Rm is measurement
x Rn is vector to be estimated
v is a measurement noise or error

## norm-bound model of noise: we assume kvk but otherwise know

nothing about v ( gives max norm of noise)

## SVD Applications 169

consider estimator x
= By, with BA = I (i.e., unbiased)

=x

## set of possible estimation errors is ellipsoid

Eunc = { Bv | kvk }
x

x=x
x
x
Eunc = x
+ Eunc, i.e.:
true x lies in uncertainty ellipsoid Eunc, centered at estimate x

## SVD Applications 1610

semiaxes of Eunc are iui (singular values & vectors of B)

x xk kBk

## optimality of least-squares: suppose BA = I is any estimator, and

Bls = A is the least-squares estimator

then:

BlsBlsT BB T
Els E
in particular kBlsk kBk

we have
k1T
k2T
y =
k3T x

k4T
where ki R2

  1 
k1T
=
x 022 y
k2T

= A y
x

## SVD Applications 1612

uncertainty regions (with = 1):

## uncertainty region for x using inversion

20

15

x2
10

0
10 8 6 4 2 0 2 4 6 8 10
x1
uncertainty region for x using least-squares
20

15
x2

10

0
10 8 6 4 2 0 2 4 6 8 10
x1

## Proof of optimality property

suppose A Rmn, m > n, is full rank
SVD: A = U V T , with V orthogonal
Bls = A = V 1U T , and B satisfies BA = I
define Z = B Bls, so B = Bls + Z
then ZA = ZU V T = 0, so ZU = 0 (multiply by V 1 on right)
therefore

## BB T = (Bls + Z)(Bls + Z)T

= BlsBlsT + BlsZ T + ZBlsT + ZZ T
= BlsBlsT + ZZ T
BlsBlsT

## SVD Applications 1614

Sensitivity of linear equations to data error

## hence we have kxk = kA1yk kA1kkyk

if kA1k is large,

## small errors in y can lead to large errors in x

cant solve for x given y (with small errors)
hence, A can be considered singular in practice

kxk kyk
kAkkA1k
kxk kyk

we have:

we say

## same analysis holds for least-squares approximate solutions with A

nonsquare, = max(A)/min(A)

## Low rank approximations

r
X
mn T
suppose A R , Rank(A) = r, with SVD A = U V = iuiviT
i=1

Rank(A)
we seek matrix A, p < r, s.t. A A in the sense that
is minimized
kA Ak

## solution: optimal rank p approximator is

p
A =
X
iuiviT
i=1

P
r T
hence kA Ak = i=p+1 iuivi = p+1

## interpretation: SVD dyads uiviT are ranked in order of importance;

take p to get rank p approximant

## SVD Applications 1618

proof: suppose Rank(B) p

## also, dim span{v1, . . . , vp+1} = p + 1

hence, the two subspaces intersect, i.e., there is a unit vector z Rn s.t.

Bz = 0, z span{v1, . . . , vp+1}

p+1
X
(A B)z = Az = iuiviT z
i=1

p+1
X
2
k(A B)zk = i2(viT z)2 p+1
2
kzk2
i=1

hence kA Bk p+1 = kA Ak

## SVD Applications 1619

Distance to singularity

another interpretation of i:

i = min{ kA Bk | Rank(B) i 1 }

matrix

## SVD Applications 1620

application: model simplification
suppose y = Ax + v, where

## unknown error or noise v has norm on the order of 0.1

then the terms iuiviT x, for i = 5, . . . , 30, are substantially smaller than
the noise term v
simplified model:
4
X
y= iuiviT x + v
i=1

## SVD Applications 1621

EE263 Autumn 2008-09 Stephen Boyd

Lecture 17
Example: Quantum mechanics

## wave function and Schrodinger equation

discretization

preservation of probability

## eigenvalues & eigenstates

example

171

Quantum mechanics

## single particle in interval [0, 1], mass m

potential V : [0, 1] R

## : [0, 1] R+ C is (complex-valued) wave function

interpretation: |(x, t)|2 is probability density of particle at position x,
time t
Z 1
(so |(x, t)|2 dx = 1 for all t)
0

## evolution of governed by Schrodinger equation:

!
2
i
h = V h
2 = H
2m x

where H is Hamiltonian operator, i = 1

## Example: Quantum mechanics 172

Discretization
lets discretize position x into N discrete points, k/N , k = 1, . . . , N
wave function is approximated as vector (t) CN
2x operator is approximated as matrix

2 1
1 2 1
2 2

=N 1 2 1
... ... ...

1 2

so w = 2v means
(vk+1 vk )/(1/N ) (vk vk1)(1/N )
wk =
1/N

= (i/
h/2m)2) = (i/
h)(V ( h)H

## hence we analyze using linear dynamical system theory (with complex

vectors & matrices):
= (i/
h)H

## matrix e(i/h)tH propogates wave function forward in time t seconds

(backward if t < 0)

## Example: Quantum mechanics 174

Preservation of probability

d d
kk2 =
dt dt
+
=
h)H) + ((i/
= ((i/ h)H)
h)H + (i/
= (i/ h)H
= 0

(using H = H T RN N )

## unitary is extension of orthogonal for complex matrix: if U CN N is

unitary and z CN , then

## Example: Quantum mechanics 176

Eigenvalues & eigenstates

H is symmetric, so

## its eigenvectors v1, . . . , vN can be chosen to be orthogonal (and real)

from Hv = v (i/
h)Hv = (i/
h)v we see:

eigenvectors of (i/
h)H are same as eigenvectors of H, i.e., v1, . . . , vN

eigenvalues of (i/
h)H are (i/
h)1, . . . , (i/
h)N (which are pure
imaginary)

## for mode (t) = e(i/h)k tvk , probability density

(i/h)k t 2

2
|m(t)| = e vk = |vmk |2

## Example: Quantum mechanics 178

Example
Potential Function V (x)
1000

900

800

700

600

V
500

400

300

200

100

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

## potential bump in middle of infinite potential well

(for this example, we set
h = 1, m = 1 . . . )

## lowest energy eigenfunctions

0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2
0
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

## potential V shown as dotted line (scaled to fit plot)

four eigenstates with lowest energy shown (i.e., v1, v2, v3, v4)

## Example: Quantum mechanics 1710

now lets look at a trajectory of , with initial wave function (0)

## Example: Quantum mechanics 1711

0.08

0.06

0.04

0.02

0
0 0.1 0.2 0.3 0.4
x
0.5 0.6 0.7 0.8 0.9 1

0.15

0.1

0.05

0
0 10 20 30 40 50 60 70 80 90 100

eigenstate
top plot shows initial probability density |(0)|2
bottom plot shows |vk(0)|2, i.e., resolution of (0) into eigenstates

## Example: Quantum mechanics 1712

time evolution, for t = 0, 40, 80, . . . , 320:
|(t)|2

## cf. classical solution:

particle rolls half way up potential bump, stops, then rolls back down

## reverses velocity when it hits the wall at left

(perfectly elastic collision)

then repeats

## Example: Quantum mechanics 1714

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
4

t x 10

N/2
X
plot shows probability that particle is in left half of well, i.e., |k (t)|2,
k=1
versus time t

## Example: Quantum mechanics 1715

EE263 Autumn 2008-09 Stephen Boyd

Lecture 18
Controllability and state transfer

state transfer

181

State transfer

## consider x = Ax + Bu (or x(t + 1) = Ax(t) + Bu(t)) over time interval

[ti, tf ]
we say input u : [ti, tf ] Rm steers or transfers state from x(ti) to x(tf )
(over time interval [ti, tf ])
(subscripts stand for initial and final)
questions:

Reachability

## for CT system x = Ax + Bu,

 Z t 

Rt = e(t )ABu( ) d u : [0, t] Rm
0

## and for DT system x(t + 1) = Ax(t) + Bu(t),

( )
t1
X
At1 Bu( ) u(t) Rm

Rt =

=0

## Controllability and state transfer 183

Rt is a subspace of Rn

Rt Rs if t s
(i.e., can reach more points given more time)

we define the reachable set R as the set of points reachable for some t:
[
R= Rt
t0

## Controllability and state transfer 184

Reachability for discrete-time LDS

## DT system x(t + 1) = Ax(t) + Bu(t), x(t) Rn

u(t 1)
x(t) = Ct ..
u(0)
 
where Ct = B AB At1B

## by C-H theorem, we can express each Ak for k n as linear combination

of A0, . . . , An1

## Controllability and state transfer 185

thus we have 
range(Ct) t < n
Rt =
range(C) t n
where C = Cn is called the controllability matrix

## Controllability and state transfer 186

Controllable system

R = Rn)

## system is reachable if and only if Rank(C) = n

   
0 1 1
example: x(t + 1) = x(t) + u(t)
1 0 1
 
1 1
controllability matrix is C =
1 1

## hence system is not controllable; reachable set is

R = range(C) = { x | x1 = x2 }

## with tf > ti,

u(tf 1)
x(tf ) = Atf ti x(ti) + Ctf ti ..
u(ti)

## general state transfer reduces to reachability problem

if system is controllable any state transfer can be achieved in n steps
important special case: driving state to zero (sometimes called
regulating or controlling state)

## Controllability and state transfer 188

Least-norm input for reachability

## to steer x(0) = 0 to x(t) = xdes, inputs u(0), . . . , u(t 1) must satisfy

u(t 1)
xdes = Ct ..
u(0)

among all u that steer x(0) = 0 to x(t) = xdes, the one that minimizes

t1
X
ku( )k2
=0

## Controllability and state transfer 189

is given by
uln(t 1)
.. = CtT (CtCtT )1xdes
uln(0)

uln is called least-norm or minimum energy input that effects state transfer

can express as

t1
!1
X
uln( ) = B T (AT )(t1 ) AsBB T (AT )s xdes,
s=0

for = 0, . . . , t 1

## Controllability and state transfer 1810

t1
X
Emin, the minimum value of ku( )k2 required to reach x(t) = xdes, is
=0
sometimes called minimum energy required to reach x(t) = xdes

t1
X
Emin = kuln( )k2
=0
T
= CtT (CtCtT )1xdes CtT (CtCtT )1xdes
= xTdes(CtCtT )1xdes
t1
!1
X
= xTdes A BB T (AT ) xdes
=0

## Emin(xdes, t) gives measure of how hard it is to reach x(t) = xdes from

x(0) = 0 (i.e., how large a u is required)

## Emin(xdes, t) gives practical measure of controllability/reachability (as

function of xdes, t)

## ellipsoid { z | Emin(z, t) 1 } shows points in state space reachable at t

with one unit of energy
(shows directions that can be reached with small inputs, and directions
that can be reached only with large inputs)

## Controllability and state transfer 1812

Emin as function of t:

if t s then
t1
X s1
X
T T
A BB (A ) A BB T (AT )
=0 =0

hence !1 !1
t1
X s1
X
A BB T (AT ) A BB T (AT )
=0 =0

so Emin(xdes, t) Emin(xdes, s)

## Controllability and state transfer 1813

   
1.75 0.8 1
example: x(t + 1) = x(t) + u(t)
0.95 0 0

## Emin(z, t) for z = [1 1]T :

10

6
Emin

0
0 5 10 15 20 25 30 35

## Controllability and state transfer 1814

ellipsoids Emin 1 for t = 3 and t = 10:

10
Emin(x, 3) 1

x2
0

10
10 8 6 4 2
x10 2 4 6 8 10

10
Emin(x, 10) 1

5
x2

10
10 8 6 4 2
x10 2 4 6 8 10

## Minimum energy over infinite horizon

the matrix !1
t1
X
P = lim A BB T (AT )
t
=0
always exists, and gives the minimum energy required to reach a point xdes
(with no limit on t):

( )
t1
X
ku( )k2 x(0) = 0, x(t) = xdes = xTdesP xdes

min

=0

## Controllability and state transfer 1816

P z = 0, z 6= 0 means can get to z using us with energy as small as you
like
(u just gives a little kick to the state; the instability carries it out to z
efficiently)

## Controllability and state transfer 1817

Continuous-time reachability
consider now x = Ax + Bu with x(t) Rn
reachable set at time t is
 Z t 
m
Rt = e(t )ABu( ) d u : [0, t] R

0

 
C = B AB An1B

## for continuous-time system, any reachable point can be reached as fast

as you like (with large enough u)

## Controllability and state transfer 1818

first lets show for any u (and x(0) = 0) we have x(t) range(C)

## write etA as power series:

t t2
etA = I + A + A2 +
1! 2!

by C-H, express An, An+1, . . . in terms of A0, . . . , An1 and collect powers
of A:
etA = 0(t)I + 1(t)A + + n1(t)An1

therefore
Z t
x(t) = e ABu(t ) d
0
Z n1
!
t X
= i( )Ai Bu(t ) d
0 i=0

## Controllability and state transfer 1819

n1
X Z t
i
= AB i( )u(t ) d
i=0 0

= Cz
Z t
where zi = i( )u(t ) d
0

## Controllability and state transfer 1820

Impulsive inputs
suppose x(0) = 0 and we apply input u(t) = (k)(t)f , where (k) denotes
kth derivative of and f Rm
then U (s) = sk f , so

## X(s) = (sI A)1Bsk f


= s1I + s2A + Bsk f
sAk2 + Ak1} +s1Ak + )Bf
= ( |sk1 + + {z
impulsive terms

hence

k k+1 t k+2 t2
x(t) = impulsive terms + A Bf + A Bf + A Bf +
1! 2!

in particular, x(0+) = Ak Bf

## u(t) = (t)f0 + + (n1)(t)fn1

where fi Rm

by linearity we have

f0
x(0+) = Bf0 + + An1Bfn1 = C ..
fn1

## Controllability and state transfer 1822

can also be shown that any point in range(C) can be reached for any t > 0
using nonimpulsive inputs

fact: if x(0) R, then x(t) R for all t (no matter what u is)

Example

## unit masses at y1, y2, connected by unit springs, dampers

input is tension between masses
state is x = [y T y T ]T
u(t) u(t)

system is
0 0 1 0 0
0 0 0 1
0
x =
1 x+ u
1 1 1 1
1 1 1 1 1

## can we maneuver state anywhere, starting from x(0) = 0?

if not, where can we maneuver state?

## Controllability and state transfer 1824

controllability matrix is

0 1 2 2
  0 1 2 2
C= B AB A2B A3B =
1 2

2 0
1 2 2 0

1 0

1 , 0

R = span 0 1

0 1

## we can reach states with y1 = y2, y 1 = y 2, i.e., precisely the

differential motions
its obvious internal force does not affect center of mass position or
total momentum!

Z t
ku( )k2 d
0

## Controllability and state transfer 1826

so
u d (N 1)

x(t) = Bd AdBd AN 1
B

..
d d
ud(0)
where Z h
hA
Ad = e , Bd = e A d B
0

## least-norm ud that yields x(t) = xdes is

N 1
!1
X
udln(k) = BdT (ATd )(N 1k) AidBdBdT (ATd )i xdes
i=0

## lets express in terms of A:

T
BdT (ATd )(N 1k) = BdT e(t )A

## for N large, Bd (t/N )B, so this is approximately

T
(t/N )B T e(t )A

similarly

N
X 1 N
X 1
T
AidBdBdT (ATd )i = e(ti/N )ABdBdT e(ti/N )A
i=0 i=0
Z t
T
(t/N ) etABB T etA dt
0

for large N

## Controllability and state transfer 1828

hence least-norm discretized input is approximately

Z t 1
T (t )AT tA T tAT
uln( ) = B e e BB e dt xdes, 0 t
0

for large N

## Controllability and state transfer 1829

min energy is Z t
kuln( )k2 d = xTdesQ(t)1xdes
0
where Z t
T
Q(t) = e ABB T e A d
0

can show

## (A, B) controllable Q(t) > 0 for all t > 0

Q(s) > 0 for some s > 0

## Controllability and state transfer 1830

Minimum energy over infinite horizon

the matrix Z 1
t
T
P = lim e ABB T e A d
t 0
always exists, and gives minimum energy required to reach a point xdes
(with no limit on t):

 Z t 

min ku( )k2 d x(0) = 0, x(t) = xdes = xTdesP xdes
0

## if A is stable, P > 0 (i.e., cant get anywhere for free)

if A is not stable, then P can have nonzero nullspace
P z = 0, z 6= 0 means can get to z using us with energy as small as you
like (u just gives a little kick to the state; the instability carries it out to
z efficiently)

## consider state transfer from x(ti) to x(tf ) = xdes, tf > ti

since Z tf
(tf ti )A
x(tf ) = e x(ti) + e(tf )ABu( ) d
ti

## if system is controllable, any state transfer can be effected

in zero time with impulsive inputs
in any positive time with non-impulsive inputs

Example

u1 u1 u2 u2

 
y
x=
y

## Controllability and state transfer 1833

system is:

0 0 0 1 0 0 0 0

0 0 0 0 1 0

0 0
 
0 0 0 0 0 1 0 0 u1
x = x +

2 1 0 2 1 0

1 0 u2

1 2 1 1 2 1 1 1
0 1 2 0 1 2 0 1

## Controllability and state transfer 1834

steer state from x(0) = e1 to x(tf ) = 0

## i.e., control initial state e1 to zero at t = tf

Z tf
Emin = kuln( )k2 d vs. tf :
0

50

45

40

35

Emin30

25

20

15

10

0
0 2 4 6 8 10 12

tf

## for tf = 3, u = uln is:

5
u1(t)

5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

t
5
u2(t)

5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

## Controllability and state transfer 1836

and for tf = 4:
5

u1(t)
0

5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

t
5
u2(t)

5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

## Controllability and state transfer 1837

output y1 for u = 0:

0.8

0.6
y1(t)

0.4

0.2

0.2

0.4
0 2 4 6 8 10 12 14

## Controllability and state transfer 1838

output y1 for u = uln with tf = 3:

0.8

0.6

y1(t)
0.4

0.2

0.2

0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

## output y1 for u = uln with tf = 4:

0.8

0.6
y1(t)

0.4

0.2

0.2

0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

## Controllability and state transfer 1840

EE263 Autumn 2008-09 Stephen Boyd

Lecture 19
Observability and state estimation

state estimation
discrete-time observability
observability controllability duality
observers for noiseless case
continuous-time observability
least-squares observers
example

191

## w is state disturbance or noise

v is sensor noise or error
A, B, C, and D are known
u and y are observed over time interval [0, t 1]
w and v are not known, but can be described statistically, or assumed
small (e.g., in RMS value)

## Observability and state estimation 192

State estimation problem

## s = 0: estimate initial state

s = t 1: estimate current state
s = t: estimate (i.e., predict) next state

## an algorithm or system that yields an estimate x

(s) is called an observer or
state estimator

(s) is denoted x
x
x(s) given t 1)
(read,

Noiseless case

## with x(t) Rn, u(t) Rm, y(t) Rp

then we have

y(0) u(0)
.. = Otx(0) + Tt ..
y(t 1) u(t 1)

## Observability and state estimation 194

where

C D 0
CA CB D 0
Ot =
.. ,
Tt =
..

CAt1 CAt2B CAt3B CB D

## Tt maps input to output over [0, t 1]

hence we have

y(0) u(0)
Otx(0) = .. Tt ..
y(t 1) u(t 1)

hence:

## input u does not affect ability to determine x(0);

its effect can be subtracted out

## Observability and state estimation 196

Observability matrix

## by C-H theorem, each Ak is linear combination of A0, . . . , An1

hence for t n, N (Ot) = N (O) where

C
CA
O = On =
..

n1
CA

## is called the observability matrix

if x(0) can be deduced from u and y over [0, t 1] for any t, then x(0)
can be deduced from u and y over [0, n 1]
N (O) is called unobservable subspace; describes ambiguity in determining
state from input and output
system is called observable if N (O) = {0}, i.e., Rank(O) = n

## Observability controllability duality

B,
let (A, C,
D)
be dual of system (A, B, C, D), i.e.,

A = AT , = CT ,
B C = B T , = DT
D

## controllability matrix of dual system is

C = [B
AB
An1B]

= [C T AT C T (AT )n1C T ]
= OT ,

## transpose of observability matrix

= CT
similarly we have O

## Observability and state estimation 198

thus, system is observable (controllable) if and only if dual system is
controllable (observable)

in fact,

N (O) = range(OT ) = range(C)
i.e., unobservable subspace is orthogonal complement of controllable
subspace of dual

## suppose Rank(Ot) = n (i.e., system is observable) and let F be any left

inverse of Ot, i.e., F Ot = I

## then we have the observer

y(0) u(0)
x(0) = F .. Tt ..
y(t 1) u(t 1)

## which deduces x(0) (exactly) from u, y over [0, t 1]

in fact we have

y( t + 1) u( t + 1)
x( t + 1) = F .. Tt ..
y( ) u( )

## Observability and state estimation 1910

i.e., our observer estimates what state was t 1 epochs ago, given past
t 1 inputs & outputs

## observer is (multi-input, multi-output) finite impulse response (FIR) filter,

with inputs u and y, and output x

then Az N (O)

## evidently CAk (Az) = 0 for k = 0, . . . , n 2;

n1
X
n1 n
CA (Az) = CA z = iCAiz = 0
i=0

## (by C-H) where

det(sI A) = sn + n1sn1 + + 0

## Observability and state estimation 1912

Continuous-time observability

## continuous-time system with no sensor or state noise:

x = Ax + Bu, y = Cx + Du

## lets look at derivatives of y:

y = Cx + Du
y = C x + Du = CAx + CBu + Du
y = CA2x + CABu + CB u + D
u

and so on

## Observability and state estimation 1913

hence we have
y u
y
= Ox + T u

.. ..

y (n1) u(n1)
where O is the observability matrix and

D 0
CB D 0
T =
..

CAn2B CAn3B CB D

rewrite as
y u
y u
Ox =
.. T
..

y (n1) u(n1)

## RHS is known; x is to be determined

hence if N (O) = {0} we can deduce x(t) from derivatives of u(t), y(t) up
to order n 1

y u
y
T u
x=F

.. ..

y (n1) u(n1)

impulsive inputs

A converse

## suppose z N (O) (the unobservable subspace), and u is any input, with

x, y the corresponding state and output, i.e.,

x = Ax + Bu, y = Cx + Du

= x + etAz satisfies
then state trajectory x

= A
x x + Bu, y = Cx
+ Du

## i.e., input/output signals u, y consistent with both state trajectories x, x

hence if system is unobservable, no signal processing of any kind applied to
u and y can deduce x
unobservable subspace N (O) gives fundamental ambiguity in deducing x
from u, y

## Observability and state estimation 1917

Least-squares observers

## least-squares observer uses pseudo-inverse:

y(0) u(0)
(0) = Ot
x .. Tt ..
y(t 1) u(t 1)

1
where Ot = OtT Ot OtT

## Observability and state estimation 1918

ls(0) minimizes discrepancy between
interpretation: x

output y that would be observed, with input u and initial state x(0)
(and no sensor noise), and

## output y that was observed,

t1
X
measured as y ( ) y( )k2
k
=0

## can express least-squares initial state estimate as

t1
!1 t1
X X
ls(0) =
x (AT ) C T CA (AT ) C T y( )
=0 =0

## where y is observed output with portion due to input subtracted:

y = y h u where h is impulse response

## since OtOt = I, we have

v(0)
ls(0) x(0) = Ot
(0) = x
x ..
v(t 1)

where x
(0) is the estimation error of the initial state

in particular, x
ls(0) = x(0) if sensor noise is zero
(i.e., observer recovers exact state in noiseless case)

t1
1X
kv( )k2 2
t =0

## Observability and state estimation 1920

set of possible estimation errors is ellipsoid

v(0) t1
1X
(0) Eunc
x = O .
. 2
kv( )k 2
t t
v(t 1) =0

## shape of uncertainty ellipsoid determined by matrix

t1
!1
1 X
OtT Ot = (AT ) C T CA
=0

## maximum norm of error is

xls(0) x(0)k tkOtk
k

## Infinite horizon uncertainty ellipsoid

the matrix !1
t1
X
P = lim (AT ) C T CA
t
=0
always exists, and gives the limiting uncertainty in estimating x(0) from u,
y over longer and longer periods:

if A is stable, P > 0
i.e., cant estimate initial state perfectly even with infinite number of
measurements u(t), y(t), t = 0, . . . (since memory of x(0) fades . . . )
if A is not stable, then P can have nonzero nullspace
i.e., initial state estimation error gets arbitrarily small (at least in some
directions) as more and more of signals u and y are observed

Example

## particle in R2 moves with uniform velocity

(linear, noisy) range measurements from directions 15, 0, 20, 30,
once per second
range noises IID N (0, 1); can assume RMS value of v is not much more
than 2
no assumptions about initial position & velocity

particle

range sensors

## express as linear system

1 0 1 0
k1T

0 1 0 1
x(t + 1) =
0
x(t), y(t) = .. x(t) + v(t)
0 1 0
k4T
0 0 0 1

## Observability and state estimation 1924

range measurements (& noiseless versions):

## measurements from sensors 1 4

5

range 1

5
0 20 40 60 80 100 120

(0|t)

## actual RMS position error is

p
x1(0|t) x1(0))2 + (
( x2(0|t) x2(0))2

## Observability and state estimation 1926

RMS position error
1.5

0.5

0
10 20 30 40 50 60 70 80 90 100 110 120

## RMS velocity error 1

0.8

0.6

0.4

0.2

0
10 20 30 40 50 60 70 80 90 100 110 120

## assume x = Ax + Bu, y = Cx + Du + v is observable

least-squares estimate of initial state x(0), given u( ), y( ), 0 t:
choose xls(0) to minimize integral square residual
Z t
y( ) Ce Ax(0) 2 d

J=
0

## where y = y h u is observed output minus part due to input

lets expand as J = x(0)T Qx(0) + 2rT x(0) + s,
Z t Z t
AT T A T
Q= e C Ce d, r= e A C T y( ) d,
0 0

Z t
q= y( )T y( ) d
0

## Observability and state estimation 1928

setting x(0)J to zero, we obtain the least-squares observer

Z t 1 Z t
AT T A T
ls(0) = Q
x 1
r= e C Ce d eA C T y( ) d
0 0

estimation error is
Z t 1 Z t
AT T A T
x ls(0) x(0) =
(0) = x e C Ce d e A C T v( ) d
0 0

therefore if v = 0 then x
ls(0) = x(0)

## Observability and state estimation 1929

EE263 Autumn 2008-09 Stephen Boyd

Lecture 20
Some parting thoughts . . .

linear algebra

levels of understanding

whats next?

201

Linear algebra

comes up in many practical contexts (EE, ME, CE, AA, OR, Econ, . . . )

## nowadays is readily done

cf. 10 yrs ago (when it was mostly talked about)

## current level of linear algebra technology:

500 1000 vbles: easy with general purpose codes
much more possible with special structure, special codes (e.g., sparse,
convolution, banded, . . . )

## Some parting thoughts . . . 202

Levels of understanding

Platonic view:

## Some parting thoughts . . . 203

Quantitative view:

## Some parting thoughts . . . 204

must have understanding at one level before moving to next

Whats next?

## EE364a convex optimization I (Spr 08-09)

(plus lots of other EE, CS, ICME, MS&E, Stat, ME, AA courses on signal
processing, control, graphics & vision, adaptive systems, machine learning,
computational geometry, numerical linear algebra, . . . )