16 tayangan

Judul Asli: 263lectures Slides

Diunggah oleh Daniel Baker

- Depth From Gradient
- Mathematics - Mathematical Economics and Finance
- David Pool
- Econometrics
- Lecture Notes on orthogonal polynomials of several variables by Dr Yaun Xu.pdf
- Math 1185: Honors Linear Algebra
- Vectors
- I.L. Kantor, A.S. Solodovnikov-Hypercomplex Numbers_ an Elementary Introduction to Algebras (1989)
- Oscillator Phase Noise a Geometrical Approach
- Mid Term Paper 2 Revision (PDF)
- MathSem3
- LinearAlgebraStudyGuide.pdf
- 3 Dimension&Basis
- Class Notes on Linear Algebra
- Pose 25 Lines
- True of False for Vector Spaces
- Review of Basic Linear Algebra
- Hw 8 Solutions
- solutions5
- IAS Mains Mathematics 2013

Anda di halaman 1dari 256

Lecture 1

Overview

course mechanics

some examples

11

Course mechanics

www.stanford.edu/class/ee263

course requirements:

weekly homework

Overview 12

Prerequisites

control systems

dynamics

Overview 13

Overview 14

Linear dynamical system

dx

= A(t)x(t) + B(t)u(t), y(t) = C(t)x(t) + D(t)u(t)

dt

where:

t R denotes time

Overview 15

x = Ax + Bu, y = Cx + Du

Overview 16

Some LDS terminology

constant, i.e., dont depend on t

autonomous

single-output (SISO); when input & output signal dimensions are more

than one, MIMO

Overview 17

where

t Z = {0, 1, 2, . . .}

Overview 18

Why study linear systems?

signal processing

communications

economics, finance

circuit analysis, simulation, design

mechanical and civil engineering

aeronautics

navigation, guidance

Overview 19

Usefulness of LDS

increasing exponentially

used for

analysis & design

implementation, embedded in real-time systems

Overview 110

Origins and history

. . . ) but with more emphasis on linear algebra

(just like digital signal processing, information theory, . . . )

Overview 111

linear systems?

methods for linear systems often work unreasonably well, in practice, for

nonlinear systems

understand nonlinear dynamical systems

Overview 112

Examples (ideas only, no details)

x = Ax, y = Cx

Overview 113

typical output:

3

1

y

3

0 50 100 150 200 250 300 350

3

t

2

1

y

3

0 100 200 300 400 500 600 700 800 900 1000

unpredictable

well see that such a solution can be decomposed into much simpler

(modal) components

Overview 114

0.2

t

0

0.2

0 50 100 150 200 250 300 350

1

1

0 50 100 150 200 250 300 350

0.5

0.5

0 50 100 150 200 250 300 350

2

2

0 50 100 150 200 250 300 350

1

1

0 50 100 150 200 250 300 350

2

2

0 50 100 150 200 250 300 350

5

5

0 50 100 150 200 250 300 350

0.2

0.2

0 50 100 150 200 250 300 350

Overview 115

Input design

problem: find appropriate u : R+ R2 so that y(t) ydes = (1, 2)

simple approach: consider static conditions (u, x, y constant):

x = 0 = Ax + Bustatic, y = ydes = Cx

1 0.63

ustatic = CA1B ydes =

0.36

Overview 116

lets apply u = ustatic and just wait for things to settle:

0

0.2

u1

0.4

0.6

0.8

1

200 0 200 400 600 800 1000 1200 1400 1600 1800

t

0.4

0.3

u2

0.2

0.1

0

0.1

200 0 200 400 600 800 1000 1200 1400 1600 1800

2

t

1.5

y1 1

0.5

200 0 200 400 600 800 1000 1200 1400 1600 1800

t

0

1

y2

4

200 0 200 400 600 800 1000 1200 1400 1600 1800

t

Overview 117

using very clever input waveforms (EE263) we can do much better, e.g.

0.2

0

u1

0.2

0.4

0.6

0 10 20 30 40 50 60

t

0.4

u2

0.2

0.2

0 10 20 30 40 50 60

t

1

y1

0.5

0.5

0 10 20 30 40 50 60

t

0

0.5

y2

1

1.5

2

2.5

0 10 20 30 40 50 60

t

Overview 118

in fact by using larger inputs we do still better, e.g.

5

u1

0

5

5 0 5 10 15 20 25

t

1

0.5

u2

0

0.5

1

1.5

5 0 5 10 15 20 25

2

t

1

y1 0

5 0 5 10 15 20 25

t

0

0.5

y2

1.5

2

5 0 5 10 15 20 25

t

Overview 119

Overview 120

Estimation / filtering

u w y

H(s) A/D

Overview 121

1

u(t)

1

0 1 2 3 4 5 6 7 8 9 10

1.5

s(t)

0.5

0

0 1 2 3 4 5 6 7 8 9 10

1

w(t)

1

0 1 2 3 4 5 6 7 8 9 10

1

y(t)

1

0 1 2 3 4 5 6 7 8 9 10

Overview 122

simple approach:

ignore quantization

approximate u as G(s)y

Overview 123

1

(t) (dotted)

0.8

0.6

0.4

0.2

u(t) (solid) and u

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9 10

Overview 124

EE263 Autumn 2008-09 Stephen Boyd

Lecture 2

Linear functions and examples

engineering examples

interpretations

21

Linear equations

y2 = a21x1 + a22x2 + + a2nxn

..

ym = am1x1 + am2x2 + + amnxn

y1 a11 a12 a1n x1

y2 a21 a22 a2n x2

y=

..

A=

.. ... ..

x=

..

Linear functions

a function f : Rn Rm is linear if

f (x + y) = f (x) + f (y), x, y Rn

f (x) = f (x), x Rn R

x+y

f (y)

y

x

f (x + y)

f (x)

f (x) = Ax for some A Rmn

function f there is only one matrix A for which f (x) = Ax for all x

Interpretations of y = Ax

y Rm

Interpretation of aij

n

X

yi = aij xj

j=1

aij is gain factor from jth input (xj ) to ith output (yi)

thus, e.g.,

a27 = 0 means 2nd output (y2) doesnt depend on 7th input (x7)

|a52| |ai2| for i 6= 5 means x2 affects mainly y5

x1, . . . , xi

ith input

A, shows which xj affect which yi

yi is (small) deflection of some node, in some fixed direction

x1

x2

x3

x4

aij gives deflection i per unit force at j (in m/N)

Total force/torque on rigid body

x4

x1

x3 CG

x2

y R6 is resulting total force & torque on body

(y1, y2, y3 are x-, y-, z- components of total force,

y4, y5, y6 are x-, y-, z- components of total torque)

we have y = Ax

A depends on geometry

(of applied forces and torques with respect to center of gravity CG)

jth column gives resulting force & torque for unit force/torque j

interconnection of resistors, linear dependent (controlled) sources, and

independent sources

y3

ib x2

y1 x1 ib y2

yi is some circuit variable (voltage, current)

we have y = Ax

if xj are currents and yi are voltages, A is called the impedance or

resistance matrix

Final position/velocity of mass due to applied forces

0tn

f (t) = xj for j 1 t < j, j = 1, . . . , n

(x is the sequence of applied forces, constant in each interval)

y1, y2 are final position and velocity (i.e., at t = n)

we have y = Ax

a1j gives influence of applied force during j 1 t < j on final position

a2j gives influence of applied force during j 1 t < j on final velocity

Gravimeter prospecting

gi gavg

yi is measured gravity anomaly at location i, i.e., some component

(typically vertical) of gi gavg

y = Ax

A comes from physics and geometry

at voxel j

Thermal system

location 4

heating element 5

x1

x2

x3

x4

x5

y = Ax

aij gives influence of heater j at location i (in C/W)

1W at heater j

pwr. xj

ij rij

illum. yi

xj is power of jth lamp; yi is illumination level of patch i

2

y = Ax, where aij = rij max{cos ij , 0}

(cos ij < 0 means patch i is shaded from lamp j)

jth column of A shows illumination pattern from lamp j

Signal and interference power in wireless system

n transmitter/receiver pairs

transmitter j transmits to receiver j (and, inadvertantly, to the other

receivers)

pj is power of jth transmitter

si is received signal power of ith receiver

zi is received interference power of ith receiver

Gij is path gain from transmitter j to receiver i

we have s = Ap, z = Bp, where

Gii i = j 0 i=j

aij = bij =

0 i=6 j Gij i 6= j

Cost of production

number of products

product i

we have y = Ax

production inputs needed

we have r = AT q

rT x = (AT q)T x = q T Ax

n flows with rates f1, . . . , fn pass from their source nodes to their

destination nodes over fixed routes in a network

1 flow j goes over link i

Aij =

0 otherwise

link delays and flow latency

time) of flows

l = AT d

Linearization

where

fi

Df (x0)ij =

xj x0

is derivative (Jacobian) matrix

deviation y := y y0

function

Navigation by range measurement

beacons

(p1, q1)

(p4, q4) 1

4

2

(x, y) (p2, q2)

unknown position

3

(p3, q3)

p

i(x, y) = (x pi)2 + (y qi)2

x

linearize around (x0, y0): A , where

y

ai1 = p , ai2 = p

(x0 pi)2 + (y0 qi)2 (x0 pi)2 + (y0 qi)2

(small) shift in (x, y) from (x0, y0)

change in x from x0

position, a short time later

Broad categories of applications

estimation or inversion

control or design

mapping or transformation

Estimation or inversion

y = Ax

xj is jth parameter to be estimated or determined

aij is sensitivity of ith sensor to jth parameter

sample problems:

find x, given y

find all xs that result in y (i.e., all xs consistent with measurements)

if there is no x such that y = Ax, find x s.t. y Ax (i.e., if the sensor

readings are inconsistent, find x which is almost consistent)

Control or design

y = Ax

y is vector of results, or outcomes

A describes how input choices affect results

sample problems:

find all xs that result in y = ydes (i.e., find all designs that meet

specifications)

among xs that satisfy y = ydes, find a small one (i.e., find a small or

efficient x that meets specifications)

Mapping or transformation

sample problems:

(if possible) find an x that maps to y

find all xs that map to a given y

if there is only one x that maps to y, find it (i.e., decode or undo the

mapping)

Matrix multiplication as mixture of columns

A= a1 a2 an

where aj Rm

1 0 0

0 1 0

e1 =

.. ,

e2 =

.. ,

... en =

..

0 0 1

(ej corresponds to a pure mixture, giving only column j)

Matrix multiplication as inner product with rows

T1

a

T2

a

A=

..

Tn

a

i Rn

where a

then y = Ax can be written as

T1 x

a

T2 x

a

y=

..

Tmx

a

thus yi = h

ai, xi, i.e., yi is inner product of ith row of A with x

geometric interpretation:

Ti x = is a hyperplane in Rn (normal to a

yi = a i )

yi = h

ai, xi = 0

yi = h

ai, xi = 3

i

a

yi = h

ai, xi = 2

yi = h

ai, xi = 1

Block diagram representation

y = Ax can be represented by a signal flow graph or block diagram

e.g. for m = n = 2, we represent

y1 a11 a12 x1

=

y2 a21 a22 x2

as

x1 a11 y1

a21

a12

x2 a22 y2

aij is the gain along the path from jth input to ith output

(by not drawing paths with zero gain) shows sparsity structure of A

(e.g., diagonal, block upper triangular, arrow . . . )

A11 A12

A=

0 A22

x1 y1

x= , y=

x2 y2

block diagram:

x1 A11 y1

A12

x2 A22 y2

n

X

cij = aik bkj

k=1

and x = Bz

z B x A y z AB y

p n m p m

Column and row interpretations

C= c1 cp = AB = Ab1 Abp

cT1

T

1 B

a

C= .

. = AB = .

cTm TmB

a

Ti bj = h

cij = a ai, bj i

(gives inner product of each vector with the others)

Matrix multiplication interpretation via paths

x1 b11 z1 a11 y1

b21 a21

b21 a21

z2

x2 b22 a22 y2

path gain= a22b21

EE263 Autumn 2008-09 Stephen Boyd

Lecture 3

Linear algebra review

change of coordinates

31

Vector spaces

a set V

a vector sum + : V V V

a scalar multiplication : R V V

a distinguished element 0 V

x + y = y + x, x, y V (+ is commutative)

(x + y) + z = x + (y + z), x, y, z V (+ is associative)

0 + x = x, x V (0 is additive identity)

1x = x, x V

Examples

multiplication

and v1, . . . , vk Rn

Subspaces

a vector space

multiplication

functions:

(x + z)(t) = x(t) + z(t)

and scalar multiplication is defined by

(x)(t) = x(t)

V5 = {x V4 | x = Ax}

(points in V5 are trajectories of the linear system x = Ax)

V5 is a subspace of V4

Independent set of vectors

1v1 + 2v2 + + k vk = 0 = 1 = 2 = = 0

implies 1 = 1, 2 = 2, . . . , k = k

vectors v1, . . . , vi1, vi+1, . . . , vk

v = 1v1 + + k vk

fact: for a given vector space V, the number of vectors in any basis is the

same

Nullspace of a matrix

N (A) = { x Rn | Ax = 0 }

x, then x

Zero nullspace

N (A) = {0}

(i.e., the linear transformation y = Ax doesnt lose information)

det(AT A) 6= 0

Interpretations of nullspace

suppose z N (A)

y = Ax represents measurement of x

x and x + z are indistinguishable from sensors: Ax = A(x + z)

x and x + z have same result

Range of a matrix

Onto matrices

columns of A span Rm

N (AT ) = {0}

det(AAT ) 6= 0

Interpretations of range

y = Ax represents measurement of x

wrong

Inverse

A Rnn is invertible or nonsingular if det A 6= 0

equivalent conditions:

AA1 = A1A = I

N (A) = {0}

R(A) = Rn

Interpretations of inverse

either before or after!)

x = By is unique solution of Ax = y

Dual basis interpretation

n

(bTi y)ai

X

y=

i=1

thus, inner product with rows of inverse matrix gives the coefficients in

the expansion of a vector in the columns of the matrix

Rank of a matrix

(nontrivial) facts:

rank(A) = rank(AT )

hence rank(A) min(m, n)

Conservation of dimension

to zero or ends up in output

roughly speaking:

n is number of degrees of freedom in input x

dim N (A) is number of degrees of freedom lost in the mapping from

x to y = Ax

rank(A) is number of degrees of freedom in output y

with B Rmr , C Rrn:

rank(A) lines

x n A m y x n C r B m y

y from x

Application: fast matrix-vector multiplication

y = Bz): rn + mr = (m + n)r operations

for skinny matrices (m n), full rank means columns are independent

for fat matrices (m n), full rank means rows are independent

Change of coordinates

0

..

ei = 1

..

0

(1 in ith component)

obviously we have

x=x 2 t2 + + x

1 t1 + x n tn

where x

i are the coordinates of x in the basis (t1, t2, . . . , tn)

define T = t1 t2 t n so x = T x

, hence

= T 1x

x

consider linear transformation y = Ax, A Rnn

x = Tx

, y = T y

so

y = (T 1AT )

x

coordinates t1, t2, . . . , tn

(Euclidean) norm

q

kxk = x21 + x22 + + x2n = xT x

important properties:

kx + yk kxk + kyk (triangle inequality)

kxk 0 (nonnegativity)

kxk = 0 x = 0 (definiteness)

RMS value and (Euclidean) distance

n

!1/2

1X kxk

rms(x) = x2i =

n i=1

n

x

xy

Inner product

important properties:

hx, yi = hx, yi

hx + y, zi = hx, zi + hy, zi

hx, yi = hy, xi

hx, xi 0

hx, xi = 0 x = 0

vector xT

Cauchy-Schwartz inequality and angle between vectors

xT y

= 6 (x, y) = cos1

kxkkyk

xT y

kyk2

y

special cases:

(if x 6= 0) y = x for some 0

(if x 6= 0) y = x for some 0

denoted x y

interpretation of xT y > 0 and xT y < 0:

xT y < 0 means 6 (x, y) is obtuse

x x

y y

xT y > 0 xT y < 0

boundary passing through 0

{x | xT y 0}

0

y

EE263 Autumn 2008-09 Stephen Boyd

Lecture 4

Orthonormal sets of vectors and QR

factorization

41

normalized if kuik = 1, i = 1, . . . , k

(ui are called unit vectors or direction vectors)

orthogonal if ui uj for i 6= j

orthonormal if both

independence) is a property of a set of vectors, not vectors individually

U T U = Ik

orthonormal vectors are independent

(multiply 1u1 + 2u2 + + k uk = 0 by uTi )

span(u1, . . . , uk ) = R(U )

(more on this matrix later . . . )

Geometric properties

inner products are also preserved: hU z, U zi = hz, zi

if w = U z and w

= U z then

= hU z, U zi = (U z)T (U z) = z T U T U z = hz, zi

hw, wi

6 (U z, U z

) = 6 (z, z)

UTU = I

n

X

uiuTi = I

i=1

Expansion in orthonormal basis

n

X

x= (uTi x)ui

i=1

n

X

x = Ua = aiui is called the (ui-) expansion of x

i=1

Pn

the identity I = U U T = T

i=1 ui ui is sometimes written (in physics) as

n

X

I= |uiihui|

i=1

since n

X

x= |uiihui|xi

i=1

(but we wont use this notation)

Geometric interpretation

examples:

cos sin

y = U x, U =

sin cos

cos sin

y = R x, R =

sin cos

x2 x2

rotation reflection

e2 e2

e1 x1 e1 x1

Gram-Schmidt procedure

orthonormal vectors q1, . . . , qk s.t.

ones; then normalize result to have norm one

Gram-Schmidt procedure

step 1a. q1 := a1

q1k (normalize)

q2k (normalize)

q3k (normalize)

etc.

a2 q2 = a2 (q1T a2)q1

q2

q1 q1 = a1

for i = 1, 2, . . . , k we have

T

ai)qi1 + k

qikqi

= r1iq1 + r2iq2 + + riiqi

(note that the rij s come right out of the G-S procedure, and rii 6= 0)

QR decomposition

r11 r12 r1k

0 r22 r2k

a1 a2 ak = q1 q2 qk

| {z } | {z } .. .. ... ..

A Q

0 0 rkk

| {z }

R

less sensitive to numerical (rounding) errors

is linearly dependent on a1, . . . , aj1

and continue:

r = 0;

for i = 1, . . . , k

{ Pr

= ai j=1 qj qjT ai;

a

6= 0 { r = r + 1; qr = a

if a ak; }

/k

}

on exit,

each ai is linear combination of previously generated qj s

upper staircase form:

possibly nonzero entries

zero entries

S]P

A = Q[R

where:

QT Q = Ir

R

(which moves forward the columns of a which generated a new q)

Applications

previous ones

[a1 ap] for p = 1, . . . , k:

Full QR factorization

R1

A= Q1 Q2

0

orthonormal, orthogonal to Q1

to find Q2:

is full rank (e.g., A = I)

apply general Gram-Schmidt to [A A]

Q1 are orthonormal vectors obtained from columns of A

Q2 are orthonormal vectors obtained from extra columns (A)

i.e., any set of orthonormal vectors can be extended to an orthonormal

basis for Rn

R(Q1) and R(Q2) are called complementary subspaces since

they are orthogonal (i.e., every vector in the first subspace is orthogonal

to every vector in the second subspace)

two vectors, one from each subspace)

this is written

R(Q1) + R(Q2) = Rn

(each subspace is the orthogonal complement of the other)

QT1

from AT = R1T 0 we see that

QT2

AT z = 0 QT1 z = 0 z R(Q2)

so R(Q2) = N (AT )

(in fact the columns of Q2 are an orthonormal basis for N (AT ))

R(A) + N (AT ) = Rn (recall A Rnk )

every y Rn can be written uniquely as y = z + w, with z R(A),

w N (AT ) (well soon see what the vector z is . . . )

can now prove most of the assertions from the linear algebra review

lecture

N (A) + R(AT ) = Rk

EE263 Autumn 2008-09 Stephen Boyd

Lecture 5

Least-squares

least-squares estimation

BLUE property

51

(more equations than unknowns)

for most y, cannot solve for x

find x = xls that minimizes krk

Least-squares 52

Geometric interpretation

y r

Axls

R(A)

Least-squares 53

to find xls, well minimize norm of residual squared,

krk2 = xT AT Ax 2y T Ax + y T y

assumptions imply AT A invertible, so we have

Least-squares 54

xls is linear function of y

AA = (AT A)1AT A = I

Least-squares 55

Projection on R(A)

Axls is (by definition) the point in R(A) that is closest to y, i.e., it is the

projection of y onto R(A)

Axls = PR(A)(y)

Least-squares 56

Orthogonality principle

optimal residual

is orthogonal to R(A):

for all z Rn

y r

Axls

R(A)

Least-squares 57

Completion of squares

= kAxls yk2 + kA(x xls)k2

Least-squares 58

Least-squares via QR factorization

invertible

pseudo-inverse is

so xls = R1QT y

Least-squares 59

full QR factorization:

R1

A = [Q1 Q2]

0

invertible

2

2

R 1

kAx yk = [Q1 Q2 ] x y

0

2

T R 1 T

= [Q1 Q2 ] [Q1 Q 2 ] x [Q1 Q2 ] y

0

Least-squares 510

R1x QT1 y
2

=

QT2 y

(which makes first term zero)

Axls y = Q2QT2 y

Least-squares 511

Least-squares estimation

have form

y = Ax + v

Least-squares 512

least-squares estimation: choose as estimate x

that minimizes

kA

x yk

, and there were no noise (v = 0)

= (AT A)1AT y

least-squares estimate is just x

Least-squares 513

BLUE property

y = Ax + v

= By

called unbiased if x

= x whenever v = 0

(i.e., no estimation error when there is no noise)

same as BA = I, i.e., B is left inverse of A

Least-squares 514

estimation error of unbiased linear estimator is

xx

= x B(Ax + v) = Bv

following sense:

X X

2

Bij A2

ij

i,j i,j

Least-squares 515

beacons

k4 k1

k2

x

k3

unknown position

(say) nearly exact

Least-squares 516

ranges y R4 measured, with measurement noise v:

k1T

k2T

k3T x + v

y =

k4T

(details not important)

measurement is y = (11.95, 2.84, 9.81, 2.81)

Least-squares 517

compute estimate x

by inverting top (2 2) half of A:

0 1.0 0 0 2.84

x

= Bjey = y=

1.12 0.5 0 0 11.9

Least-squares 518

Least-squares method

compute estimate x

by least-squares:

0.23 0.48 0.04 0.44 4.95

x

=A y= y=

0.47 0.02 0.51 0.18 10.26

Least-squares 519

u w y

H(s) A/D

u(t) = xj , j 1 t < j, j = 1, . . . , 10

Z t

w(t) = h(t )u( ) d

0

Least-squares 520

3-bit quantization: yi = Q(

yi), i = 1, . . . , 100, where Q is 3-bit

quantizer characteristic

example:

1

u(t) 0

1

0 1 2 3 4 5 6 7 8 9 10

1.5

s(t)

0.5

0

0 1 2 3 4 5 6 7 8 9 10

1

y(t) w(t)

1

0 1 2 3 4 5 6 7 8 9 10

1

1

0 1 2 3 4 5 6 7 8 9 10

t

Least-squares 521

we have y = Ax + v, where

Z j

10010

AR is given by Aij = h(0.1i ) d

j1

yi) yi (so |vi| 0.125)

(t) (dotted)

0.8

0.6

0.4

0.2

u(t) (solid) & u

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9 10

Least-squares 522

kx xlsk

RMS error is = 0.03

10

Least-squares 523

0.15

row 2

0.1

0.05

0.05

0 1 2 3 4 5 6 7 8 9 10

0.15

row 5

0.1

0.05

0.05

0 1 2 3 4 5 6 7 8 9 10

0.15

row 8

0.1

0.05

0.05

0 1 2 3 4 5 6 7 8 9 10

of xi for i = 2, 5, 8

to estimate x5, which is the original input signal for 4 t < 5, we

mostly use y(t) for 3 t 7

Least-squares 524

EE263 Autumn 2008-09 Stephen Boyd

Lecture 6

Least-squares applications

system identification

61

we are given:

data or measurements (si, gi), i = 1, . . . , m, where si S and (usually)

mn

least-squares fit: choose x to minimize total square fitting error:

m

2

X

(x1f1(si) + + xnfn(si) gi)

i=1

Least-squares applications 62

using matrix notation, total square fitting error is kAx gk2, where

Aij = fj (si)

x = (AT A)1AT g

corresponding function is

applications:

interpolation, extrapolation, smoothing of data

developing simple, approximate model of data

Least-squares applications 63

i

1 t1 t21 t1n1

1 t2 t2 tn1

A= 2 2

.. ..

2 n1

1 tm t m t m

Least-squares applications 64

assuming tk 6= tl for k 6= l and m n, A is full rank:

suppose Aa = 0

points t1, . . . , tm

zeros, so p is identically zero, and a = 0

Least-squares applications 65

Example

least-squares fit for degrees 1, 2, 3, 4 have RMS errors .135, .076, .025,

.005, respectively

Least-squares applications 66

1

p1(t)

0.5

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

p2(t)

0.5

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

p3(t)

0.5

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

p4(t)

0.5

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

t

Least-squares applications 67

Pp

minimize k i=1 xi ai yk

for p = 1, . . . , n

regress y on a1, . . . , ap

Least-squares applications 68

solution for each p n is given by

(p)

xls = (ATp Ap)1ATp y = Rp1QTp y

where

Least-squares applications 69

linear combination of a1, . . . , ap, as function of p

kresidualk

kyk

minx1 kx1a1 yk

P7

minx1,...,x7 k i=1 xi ai yk

p

0 1 2 3 4 5 6 7

Least-squares system identification

on measured I/O data u, y

example with scalar u, y (vector u, y readily handled): fit I/O data with

moving-average (MA) model with n delays

where h0, . . . , hn R

y(n) u(n) u(n 1) u(0) h0

y(n + 1) u(n + 1) u(n) u(1) h1

.. = .. .. .. .

.

y(N ) u(N ) u(N 1) u(N n) hn

of model prediction error kek

Example

4

u(t)

0

4

0 10 20 30 40 50 60 70

t

5

y(t) 0

5

0 10 20 30 40 50 60 70

t

for n = 7 we obtain MA model with

(h0, . . . , h7) = (.024, .282, .418, .354, .243, .487, .208, .441)

5

solid: y(t): actual output

4 dashed: y(t), predicted from model

3

4

0 10 20 30 40 50 60 70

t

Model order selection

obviously the larger n, the smaller the prediction error on the data used

to form the model

suggests using largest possible model order for smallest prediction error

1

relative prediction error kek/kyk

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0 5 10 15 20 25 30 35 40 45 50

n

difficulty: for n too large the predictive ability of the model on other I/O

data (from the same system) becomes worse

Cross-validation

evaluate model predictive performance on another I/O data set not used to

develop model

4

(t)

u 0

4

0 10 20 30 40 50 60 70

t

5

y(t)

5

0 10 20 30 40 50 60 70

t

Least-squares applications 617

validation data:

1

relative prediction error

0.8

0.6

0.4

validation data

0.2

modeling data

0

0 5 10 15 20 25 30 35 40 45 50

n

for n = 50 the actual and predicted outputs on system identification and

model validation data are:

5

solid: y(t)

dashed: predicted y(t)

0

5

0 10 20 30 40 50 60 70

t

5

solid: y(t)

dashed: predicted y(t)

0

5

0 10 20 30 40 50 60 70

t

overmodeling

m

X

2

minimize kAx yk = (aTi x yi)2

i=1

each pair ai, yi corresponds to one measurement

solution is !1

m

X m

X

xls = aiaTi yiai

i=1 i=1

with time

Recursive least-squares

m

!1 m

X X

we can compute xls(m) = aiaTi yiai recursively

i=1 i=1

for m = 0, 1, . . . ,

(so, once P (m) becomes invertible, it stays invertible)

we can calculate

1

P (m + 1)1 = P (m) + am+1aTm+1

1 1

P + aaT = P 1 (P 1a)(P 1a)T

1+ aT P 1a

standard methods for computing P (m + 1)1 from P (m + 1) are O(n3)

Verification of rank one update formula

1

(P + aaT ) P 1 (P 1a)(P 1a)T

1 + aT P 1a

1

= I + aaT P 1 T

P (P 1a)(P 1a)T

1+a P a 1

1

T

aaT (P 1a)(P 1a)T

1 + a P 1a

1 aT P 1a

= I + aaT P 1 T

aa T 1

P T

aaT P 1

1+a P a 1 1+a P a 1

= I

EE263 Autumn 2008-09 Stephen Boyd

Lecture 7

Regularized least-squares and Gauss-Newton

method

multi-objective least-squares

regularized least-squares

nonlinear least-squares

Gauss-Newton method

71

Multi-objective least-squares

(x Rn is the variable)

we can make one smaller, at the expense of making the other larger

Plot of achievable objective pairs

J1 x(1)

x(3)

x(2)

J2

note that x Rn, but this plot is in R2; point labeled x(1) is really

J2(x(1)), J1(x(1))

(for the two objectives kAx yk2, kF x gk2)

Weighted-sum objective

minimize weighted-sum objective

line with slope on (J2, J1) plot

PSfrag

J1 x(1)

x(3)

x(2)

J1 + J2 =

J2

Minimizing weighted-sum objective

can express weighted-sum objective as ordinary least-squares objective:

2

2 2

A y

kAx yk + kF x gk = x

F g

2

= Ax y

where

A y

A = , y =

F g

1

x = T

A A AT y

AT A + F T F AT y + F T g

1

=

Example

optimal x:

x = aaT + I

1

a

optimal trade-off curve:

1

0.9

0.8

0.7

J1 = (y 1)2

0.6

0.5

0.4

0.3

0.2

0.1

0

0 0.5 1 1.5 2 2.5 3 3.5

3

x 10

2

J2 = kxk

Regularized least-squares

x = AT A + I AT y,

1

estimation/inversion application:

Ax y is sensor residual

Nonlinear least-squares

m

X

2

kr(x)k = ri(x)2,

i=1

where r : Rn Rm

Position estimation from ranges

locations b1, . . . , bm R2 without linearizing

we measure i = kx bik + vi

(vi is range error, unknown but assumed small)

to minimize

m m

2

X X

2

ri(x) = (i kx bik)

i=1 i=1

m

X

2

n

NLLS: find x R that minimizes kr(x)k = ri(x)2, where

i=1

r : Rn Rm

many good heuristics to compute locally optimal solution

Gauss-Newton method:

repeat

linearize r near current guess

new guess is linear LS solution, using linearized r

until convergence

Gauss-Newton method (more detail):

2

2

(k) (k)

kr(x)k A x b

1

x(k+1) = A(k)T A(k) A(k)T b(k)

Gauss-Newton example

10 beacons

+ true position (3.6, 3.2); initial guess (1.2, 1.2)

range estimates accurate to 0.5

5

5

5 4 3 2 1 0 1 2 3 4 5

16

14

12

10

5

0

5 0

0

5 5

bumps in objective due to strong nonlinearity of r

objective of Gauss-Newton iterates:

12

10

kr(x)k2

6

0

1 2 3 4 5 6 7 8 9 10

iteration

final estimate is x

= (3.3, 3.3)

estimation error is k

x xk = 0.31

(substantially smaller than range accuracy!)

convergence of Gauss-Newton iterates:

5

4

3

4

56

3

1 2

1 1

5

5 4 3 2 1 0 1 2 3 4 5

so that next iterate is not too far from previous one (hence, linearized

model still pretty accurate)

EE263 Autumn 2008-09 Stephen Boyd

Lecture 8

Least-norm solutions of undetermined

equations

81

we consider

y = Ax

where A Rmn is fat (m < n), i.e.,

well assume that A is full rank (m), so for each y Rm, there is a solution

{ x | Ax = y } = { xp + z | z N (A) }

z characterizes available choices in solution

Least-norm solution

minimize kxk

subject to Ax = y

suppose Ax = y, so A(x xln) = 0 and

T

= (A(x xln)) (AAT )1y

= 0

{ x | Ax = y }

N (A) = { x | Ax = 0 }

xln

{ x | Ax = y }

A = AT (AAT )1 is called the pseudo-inverse of full rank, fat A

A = (AT A)1AT

Q Rnm, QT Q = Im

then

kxlnk = kRT yk

Derivation via Lagrange multipliers

minimize xT x

subject to Ax = y

xL = 2x + AT = 0, L = Ax y = 0

find least norm force that transfers mass unit distance with zero final

velocity, i.e., y = (1, 0)

0.06

0.04

xln

0.02

0.02

0.04

0.06

0 2 4 6 8 10 12

1

t

0.8

position

0.6

0.4

0.2

0

0 2 4 6 8 10 12

0.2

t

velocity

0.15

0.1

0.05

0

0 2 4 6 8 10 12

define J1 = kAx yk2, J2 = kxk2

least-norm solution minimizes J2 with J1 = 0

minimizer of weighted-sum objective J1 + J2 = kAx yk2 + kxk2 is

1

x = AT A + I AT y

least-norm solution as 0

in matrix terms: as 0,

1 1

AT A + I AT AT AAT

General norm minimization with equality constraints

consider problem

minimize kAx bk

subject to Cx = d

with variable x

equivalent to

minimize (1/2)kAx bk2

subject to Cx = d

Lagrangian is

= (1/2)xT AT Ax bT Ax + (1/2)bT b + T Cx T d

xL = AT Ax AT b + C T = 0, L = Cx d = 0

AT A C T AT b

x

=

C 0 d

1

AT A C T AT b

x

=

C 0 d

if AT A is invertible, we can derive a more explicit (and complicated)

formula for x

x = (AT A)1(AT b C T )

C(AT A)1(AT b C T ) = d

so 1

= C(AT A)1C T C(AT A)1AT b d

T 1 T T T 1 T 1 T 1 T

x = (A A) A bC C(A A) C C(A A) A bd

EE263 Autumn 2008-09 Stephen Boyd

Lecture 9

Autonomous linear dynamical systems

examples

91

x = Ax

(system is time-invariant if A doesnt depend on t)

picture (phase plane):

x2

x(t)

= Ax(t)

x(t)

x1

1 0

example 1: x = x

2 1

2

1.5

0.5

0.5

1.5

2

2 1.5 1 0.5 0 0.5 1 1.5 2

0.5 1

example 2: x = x

1 0.5

2

1.5

0.5

0.5

1.5

2

2 1.5 1 0.5 0 0.5 1 1.5 2

Block diagram

n n

1/s

x(t)

x(t)

A

useful when A has structure, e.g., block upper triangular:

A11 A12

x = x

0 A22

1/s

x1

A11

A12

1/s x2

A22

Linear circuit

ic1 il1

vc1 C1 L1 vl1

vcp Cp Lr vlr

dvc dil ic vc

C = ic, L = vl, =F

dt dt vl il

vc

with state x = , we have

il

C 1 0

x = Fx

0 L1

Chemical reactions

dxi

= ai1x1 + + ainxn

dt

1 2 k k

Example: series reaction A B C with linear dynamics

k1 0 0

x = k1 k2 0 x

0 k2 0

1

0.9

0.8

0.7

x1 x3

0.6

x(t)

0.5

0.4

0.3

0.2 x2

0.1

0

0 1 2 3 4 5 6 7 8 9 10

t

Autonomous linear dynamical systems 911

Prob(z(t) = 1)

p(t) = ..

Prob(z(t) = n)

P is often sparse; Markov chain is depicted graphically

example:

0.9

1.0 1 0.1

3 0.7 2

0.1

0.2

state 1 is system OK

state 2 is system down

state 3 is system being repaired

0.9 0.7 1.0

p(t + 1) = 0.1 0.1 0 p(t)

0 0.2 0

Numerical integration of continuous system

= (I + hA)x(t)

we get approximation

x

x(1)

define new variable z = .. Rnk , so

x(k1)

0 I 0 0

x(1)

0 0 I 0

.

z = . = .. ..

z

x(k) 0 0 0 I

A0 A1 A2 Ak1

block diagram:

(k1) (k2)

x(k) 1/s x 1/s x 1/s x

Ak1 Ak2 A0

Mechanical systems

M q + Dq + Kq = 0

M is the mass matrix

K is the stiffness matrix

D is the damping matrix

q

with state x = we have

q

q 0 I

x = = x

q M 1K M 1D

Linearization near equilibrium point

x = f (x)

where f : Rn Rn

x(t)

= f (x(t)) f (xe) + Df (xe)(x(t) xe)

x(t) Df (xe)x(t)

we hope solution of x

(more later)

example: pendulum

m

mg

rewrite as first order DE with state x = :

x2

x =

(g/l) sin x1

= 0 1

x x

g/l 0

Does linearization work ?

the linearized system usually, but not always, gives a good idea of the

system behavior near xe

example 1: x = x3 near xe = 0

1/2

for x(0) > 0 solutions have form x(t) = x(0)2 + 2t

= 0; solutions are constant

linearized system is x

example 2: z = z 3 near ze = 0

1/2

for z(0) > 0 solutions have form z(t) = z(0)2 2t

= 0; solutions are constant

linearized system is z

0.5

0.45

0.4 z(t)

0.35

0.3

0.25

0.2

0.15

0.05 x(t)

0

0 10 20 30 40 50 60 70 80 90 100

Linearization along trajectory

= f (x(t), t), and is near

xtraj(t)

then

d

(x xtraj) = f (x, t) f (xtraj, t) Dxf (xtraj, t)(x xtraj)

dt

(time-varying) LDS

= Dxf (xtraj, t)x

x

is called linearized or variational system along trajectory xtraj

linearized system is

= A(t)x

x

where A(t) = Df (xtraj(t))

used to study:

EE263 Autumn 2008-09 Stephen Boyd

Lecture 10

Solution via Laplace transform and matrix

exponential

Laplace transform

matrix exponential

101

suppose z : R+ Rpq

Z

Z(s) = estz(t) dt

0

D includes at least {s | s > a}, where a satisfies |zij (t)| eat for

t 0, i = 1, . . . , p, j = 1, . . . , q

Derivative property

L(z)

= sZ(s) z(0)

Z

L(z)(s)

= estz(t)

dt

0

t

Z

= estz(t)t=0 + s estz(t) dt

0

= sZ(s) z(0)

x = Ax

x(t) = L1 (sI A)1 x(0)

Resolvent and state transition matrix

det(sI A) = 0

(t) = L1 (sI A)1 is called the state-transition matrix; it maps

the initial state to the state at time t:

x(t) = (t)x(0)

0 1

x = x

1 0

1.5

0.5

0.5

1.5

2

2 1.5 1 0.5 0 0.5 1 1.5 2

s 1

sI A = , so resolvent is

1 s

s 1

(sI A)1 = s2 +1 s2 +1

1 s

s2 +1 s2 +1

(eigenvalues are j)

s 1

1 s2 +1 s2 +1 cos t sin t

(t) = L 1 s =

s2 +1 s2 +1

sin t cos t

cos t sin t

so we have x(t) = x(0)

sin t cos t

0 1

x = x

0 0

1.5

0.5

0.5

1.5

2

2 1.5 1 0.5 0 0.5 1 1.5 2

s 1

sI A = , so resolvent is

0 s

1 1

(sI A) 1

= s s2

1

0 s

(eigenvalues are 0, 0)

1 1

s s2

1 t

(t) = L1 1 =

0 s

0 1

1 t

so we have x(t) = x(0)

0 1

Characteristic polynomial

conjugate pairs

Eigenvalues of A and poles of resolvent

det ij

(1)i+j

det(sI A)

has form fij (s)/X (s) where fij is polynomial with degree less than n

(when there are cancellations between det ij and X (s))

Matrix exponential

1 1 I A A2

(sI A) = (1/s)(I A/s) = + 2 + 3 +

s s s

(tA)2

(t) = L1 (sI A)1 = I + tA + +

2!

looks like ordinary power series

at (ta)2

e = 1 + ta + +

2!

M M2

e =I +M + +

2!

x(t) = etAx(0)

x(t) = etax(0)

matrix exponential is meant to look like scalar exponential

some things youd guess hold for the matrix exponential (by analogy

with the scalar exponential) do in fact hold

example: you might guess that eA+B = eAeB , but its false (in general)

0 1 0 1

A= , B=

1 0 0 0

A 0.54 0.84 B 1 1

e = , e =

0.84 0.54 0 1

0.16 1.40 0.54 1.38

eA+B = A B

6= e e =

0.70 0.16 0.84 0.30

with s = t we get

etAetA = etAtA = e0 = I

1

etA = etA

0 1

example: lets find eA, where A =

0 0

we already found

1 t

etA = L1(sI A)1 =

0 1

1 1

so, plugging in t = 1, we get eA =

0 1

A A2

e =I +A+ + = I + A

2!

since A2 = A3 = = 0

for x = Ax we know

time t

x( + t) = etAx( )

(backward if t < 0)

recall first order (forward Euler) approximate state update, for small t:

x( + t) x( ) + tx(

) = (I + tA)x( )

exact solution is

suppose x = Ax

z(k + 1) = ehAz(k),

Piecewise constant system

A0 0 t < t1

A(t) = A1 t1 t < t2

.

.

(matrix on righthand side is called state transition matrix for system, and

denoted (t))

ai(s)

Xi(s) =

X (s)

thus the poles of Xi are all eigenvalues of A (but not necessarily the other

way around)

first assume eigenvalues i are distinct, so Xi(s) cannot have repeated

poles

n

X

xi(t) = ij ej t

j=1

where ij depend on x(0) (linearly)

real eigenvalue corresponds to an exponentially decaying or growing

term et in solution

complex eigenvalue = + j corresponds to decaying or growing

sinusoidal term et cos(t + ) in solution

j gives exponential growth rate (if > 0), or exponential decay rate (if

< 0) of term

eigenvalues s

now suppose A has repeated eigenvalues, so Xi can have repeated poles

respectively (n1 + + nr = n)

r

X

xi(t) = pij (t)ej t

j=1

where pij (t) is a polynomial of degree < nj (that depends linearly on x(0))

Stability

meaning:

part:

i < 0, i = 1, . . . , n

the if part is clear since

lim p(t)et = 0

t

growth rate of x(t) (or decay, if < 0)

EE263 Autumn 2008-09 Stephen Boyd

Lecture 11

Eigenvectors and diagonalization

eigenvectors

left eigenvectors

diagonalization

modal form

discrete-time stability

111

C is an eigenvalue of A Cnn if

X () = det(I A) = 0

equivalent to:

Av = v

wT A = wT

if v is an eigenvector of A with eigenvalue , then so is v, for any

C, 6= 0

associated with : if Av = v, with A Rnn, R, and v Cn,

then

Av = v, Av = v

so v and v are real eigenvectors, if they are nonzero

(and at least one is)

associated with C, then v is an eigenvector associated with :

taking conjugate of Av = v we get Av = v, so

Av = v

Scaling interpretation

Ax

v

x

Av

(what is here?)

R, > 0: v and Av point in same direction

(well see later how this relates to stability of continuous- and discrete-time

systems. . . )

Dynamic interpretation

suppose Av = v, v 6= 0

(tA)2

x(t) = etAv = I + tA + + v

2!

(t)2

= v + tv + v +

2!

= etv

for C, solution is complex (well interpret later); for now, assume

R

always on the line spanned by v

eigenvalue )

Invariant sets

x( ) S for all t

trajectory

suppose Av = v, v 6= 0, R

line { tv | t R } is invariant

(in fact, ray { tv | t > 0 } is invariant)

Complex eigenvectors

suppose Av = v, v 6= 0, is complex

for a C, (complex) trajectory aetv satisfies x = Ax

hence so does (real) trajectory

x(t) = aetv

cos t sin t

= et vre vim

sin t cos t

where

v = vre + jvim, = + j, a = + j

gives logarithmic growth/decay factor

gives angular velocity of rotation in plane

Dynamic interpretation: left eigenvectors

suppose wT A = wT , w 6= 0

then

d T

(w x) = wT x = wT Ax = (wT x)

dt

i.e., w x satisfies the DE d(wT x)/dt = (wT x)

T

if, e.g., R, < 0, halfspace { z | wT z a } is invariant (for a 0)

for = + j C, (w)T x and (w)T x both have form

et ( cos(t) + sin(t))

Summary

simple (i.e., remains on line or in plane)

left eigenvectors give linear functions of state that are simple, for any

initial condition

1 10 10

example 1: x = 1 0 0 x

0 1 0

block diagram:

x1 x2 x3

1/s 1/s 1/s

1 10 10

eigenvalues are 1, j 10

1

x1

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.5

t

0

x2

0.5

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

1

t

0.5

x3

0.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

t

left eigenvector asssociated with eigenvalue 1 is

0.1

g= 0

1

1

0.9

0.8

0.7

0.6

gT x

0.5

0.4

0.3

0.2

0.1

0

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

t

Eigenvectors and diagonalization 1115

eigenvector associated with eigenvalue j 10 is

0.554 + j0.771

v = 0.244 + j0.175

0.055 j0.077

0.554 0.771

vre = 0.244 , vim = 0.175

0.055 0.077

for example, with x(0) = vre we have

1

x1

0

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.5

t

x2

0

0.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0.1

t

x3

0.1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

t

probability distribution satisfies p(t + 1) = P p(t)

Pn

pi(t) = Prob( z(t) = i ) so i=1 pi(t) = 1

Pn

Pij = Prob( z(t + 1) = i | z(t) = j ), so i=1 Pij = 1

(such matrices are called stochastic)

rewrite as:

[1 1 1]P = [1 1 1]

i.e., [1 1 1] is a left eigenvector of P with e.v. 1

hence det(I P ) = 0, so there is a right eigenvector v 6= 0 with P v = v

it can be shown thatPv can be chosen so that vi 0, hence we can

n

normalize v so that i=1 vi = 1

interpretation: v is an equilibrium distribution; i.e., if p(0) = v then

p(t) = v for all t 0

(if v is unique it is called the steady-state distribution of the Markov chain)

Diagonalization

A Rnn:

Avi = ivi, i = 1, . . . , n

express as

1

A

v1 vn

=

v1 vn

...

n

define T = v1 vn and = diag(1, . . . , n), so

AT = T

and finally

T 1AT =

similarity transformation by T diagonalizes A

T 1AT = = diag(1, . . . , n)

then AT = T , i.e.,

Avi = ivi, i = 1, . . . , n

we say A is diagonalizable if

A has a set of linearly independent eigenvectors

Not all matrices are diagonalizable

0 1

example: A =

0 0

0 1 v1

=0

0 0 v2

v1

so all eigenvectors have form v = where v1 6= 0

0

Distinct eigenvalues

diagonalizable

diagonalizable)

Diagonalization and left eigenvectors

rewrite T 1AT = as T 1A = T 1, or

w1T

T

w1

.

. A = ..

wnT wnT

thus

wiT A = iwiT

i.e., the rows of T 1 are (lin. indep.) left eigenvectors, normalized so that

wiT vj = ij

(i.e., left & right eigenvectors chosen this way are dual bases)

Modal form

suppose A is diagonalizable by T

, so

= AT x

Tx = T 1AT x

x =

x x

in new coordinate system, system is diagonal (decoupled):

x

1

1/s

x

n

1/s

i(t) = eitx

x i(0)

when eigenvalues (hence T ) are complex, system can be put in real modal

form:

i i

Mi = , i = i + ji, i = r + 1, r + 3, . . . , n

i i

where i are the complex eigenvalues (one from each conjugate pair)

block diagram of complex mode:

1/s

1/s

e.g., resolvent:

1

(sI A)1 = sT T 1 T T 1

1

= T (sI )T 1

= T (sI )1T 1

1 1

= T diag ,..., T 1

s 1 s n

k

Ak = T T 1

= T T 1 T T 1

= T k T 1

= T diag(k1 , . . . , kn)T 1

exponential (i.e., continuous-time solution):

eA = I + A + A2/2! +

2

= I + T T 1 + T T 1 /2! +

= T (I + + 2/2! + )T 1

= T eT 1

= T diag(e1 , . . . , en )T 1

substituting A = T T 1, we have

= 0T T 1 + 1T T 1 + 2(T T 1)2 +

= T 0I + 1 + 22 + T 1

Solution via diagonalization

assume A is diagonalizable

then

x(t) = etAx(0)

= T etT 1x(0)

Xn

= eit(wiT x(0))vi

i=1

interpretation:

wiT x(0)

application: for what x(0) do we have x(t) 0 as t ?

divide eigenvalues into those with negative real parts

1 < 0, . . . , s < 0,

s+1 0, . . . , n 0

from n

X

x(t) = eit(wiT x(0))vi

i=1

condition for x(t) 0 is:

or equivalently,

wiT x(0) = 0, i = s + 1, . . . , n

suppose A diagonalizable

consider discrete-time LDS x(t + 1) = Ax(t)

if A = T T 1, then Ak = T k T 1

then n

X

t

x(t) = A x(0) = ti(wiT x(0))vi 0 as t

i=1

for all x(0) if and only if

|i| < 1, i = 1, . . . , n.

we will see later that this is true even when A is not diagonalizable, so we

have

fact: x(t + 1) = Ax(t) is stable if and only if all eigenvalues of A have

magnitude less than one

EE263 Autumn 2008-09 Stephen Boyd

Lecture 12

Jordan canonical form

generalized modes

Cayley-Hamilton theorem

121

transformation, i.e.

J1

T 1AT = J = ...

Jq

where

i 1

i ...

Ji = Cnini

... 1

i

Pq

is called a Jordan block of size ni with eigenvalue i (so n = i=1 ni )

J is upper bidiagonal

more generally,

X

dim N (I A)k = min{k, ni}

i =

the Jordan blocks associated with

factor out T and T 1, I A = T (I J)T 1

0 1 0 0 0 1

iIJi = 0 0 1 (iIJi)2 = 0 0 0 (iIJi)3 = 0

0 0 0 0 0 0

(i j )k k(i j )k1 (k(k 1)/2)(i j )k2

(iIJj )k = 0 (j i)k k(j i)k1

k

0 0 (j i)

Generalized eigenvectors

express T as

T = [T1 T2 Tq ]

where Ti Cnni are the columns of T associated with ith Jordan block Ji

we have ATi = TiJi

let Ti = [vi1 vi2 vini ]

then we have:

Avi1 = ivi1,

i.e., the first column of each Ti is an eigenvector associated with e.v. i

for j = 2, . . . , ni,

Avij = vi j1 + ivij

Jordan form LDS

consider LDS x = Ax

by change of coordinates x = T x = J x

, can put into form x

i = Jix

system is decomposed into independent Jordan block systems x i

x

ni x

ni1 x

1

1/s 1/s 1/s

1

s 1

s ...

(sI J)1 = ...

1

s

(s )1 (s )2 (s )k

(s )1 (s )k+1

=

... ..

(s ) 1

by inverse Laplace transform, exponential is:

etJ = et I + tF1 + + (tk1/(k 1)!)Fk1

1 t tk1/(k 1)!

1 tk2/(k 2)!

= et

... ..

1

Generalized modes

(0) = TieJita

with general x(0) we can write

q

X

tA tJ

x(t) = e x(0) = T e T 1

x(0) = TietJi (SiT x(0))

i=1

where

S1T

T 1 = ..

SqT

modes

Cayley-Hamilton theorem

X (s) = det(sI A)

1 2

example: with A = we have X (s) = s2 5s 2, so

3 4

X (A) = A2 5A 2I

7 10 1 2

= 5 2I

15 22 3 4

= 0

corollary: for every p Z+, we have

Ap span I, A, A2, . . . , An1

I, A, . . . , An1

then

as

I = A (a1/a0)I (a2/a0)A (1/a0)An1

(A is invertible a0 6= 0) so

Proof of C-H theorem

X (s) = (s 1) (s n)

since

X (A) = X (T T 1) = T X ()T 1

it suffices to show X () = 0

X () = ( 1I) ( nI)

= diag(0, 2 1, . . . , n 1) diag(1 n, . . . , n1 n, 0)

= 0

ni

0 1 0

X (Ji) = (Ji 1I)n1 0 0 1 (Ji q I)nq = 0

...

| {z }

(Ji i I)ni

EE263 Autumn 2008-09 Stephen Boyd

Lecture 13

Linear dynamical systems with inputs &

outputs

transfer matrix

examples

131

recall continuous-time time-invariant LDS has form

x = Ax + Bu, y = Cx + Du

Bu is called the input term (of x)

x(t)

(with u(t) = 1)

x(t)

(with u(t) = 1.5) Ax(t) x(t)

B

Interpretations

state derivative is sum of autonomous term (Ax) and one term per

input (biui)

each input ui gives another degree of freedom for x (assuming columns

of B independent)

Ti x + bTi u, where a

write x = Ax + Bu as x i = a Ti , bTi are the rows of A, B

Block diagram

x(t)

x(t)

u(t) B 1/s C y(t)

interesting when there is structure, e.g., with x1 Rn1 , x2 Rn2 :

d x1 A11 A12 x1 B1 x1

= + u, y = C1 C2

dt x2 0 A22 x2 0 x2

u B1 1/s y

x1 C1

A11

A12 x2

1/s C2

A22

x2 affects y directly and through x1

Transfer matrix

take Laplace transform of x = Ax + Bu:

hence

X(s) = (sI A)1x(0) + (sI A)1BU (s)

so Z t

tA

x(t) = e x(0) + e(t )ABu( ) d

0

function

with y = Cx + Du we have:

so Z t

tA

y(t) = Ce x(0) + Ce(t )ABu( ) d + Du(t)

0

matrix

( is the Dirac delta function)

intepretation:

Impulse matrix

impulse matrix h(t) = CetAB + D(t)

with x(0) = 0, y = h u, i.e.,

m Z

X t

yi(t) = hij (t )uj ( ) d

j=1 0

interpretations:

seconds ago

Step matrix

Z t

s(t) = h( ) d

0

interpretations:

Example 1

u1 u1 u2 u2

y

x=

y

system is:

0 0 0 1 0 0 0 0

0 0 0 0 1 0

0 0

0 0 0 0 0 1 0 0 u1

x = x +

2 1 0 2 1 0

1 0 u2

1 2 1 1 2 1 1 1

0 1 2 0 1 2 0 1

eigenvalues of A are

impulse matrix:

impulse response from u1

0.2

0.1 h11

0

0.1

h31

0.2 h21

0 5 10 15

t

impulse response from u2

0.2 h22

0.1 h12

0

0.1

h32

0.2

0 5 10 15

t

roughly speaking:

impulse at u2 affects first mass later than other two

Example 2

interconnect circuit:

C3 C4

u

C1 C2

xi is voltage across Ci

output is state: y = x

unit resistors, unit capacitors

step response matrix shows delay to each node

system is

3 1 1 0 1

1 1 0 0 0

x =

1

x + u, y=x

0 2 1 0

0 0 1 1 0

eigenvalues of A are

1

0.9

0.8

0.7

s1

0.6

s2

0.5

s3

0.4 s4

0.3

0.2

0.1

0

0 5 10 15

DC or static gain matrix

y constant:

0 = x = Ax + Bu, y = Cx + Du

eliminate x to get y = H(0)u

if system is stable,

Z

H(0) = h(t) dt = lim s(t)

0 t

Z Z t

st

(recall: H(s) = e h(t) dt, s(t) = h( ) d )

0 0

m p

if u(t) u R , then y(t) y R where y = H(0)u

1/4 1/4

H(0) = 1/2 1/2

1/4 1/4

1

1

H(0) =

1

1

Discretization with piecewise constant inputs

linear system x = Ax + Bu, y = Cx + Du

suppose ud : Z+ Rm is a sequence, and

define sequences

h > 0 is called the sample interval (for x and y) or update interval (for

u)

Z h

hA

= e e ABu((k + 1)h ) d

x(kh) +

0

Z h !

= ehAxd(k) + e A d B ud(k)

0

where

!

Z h

Ad = ehA, Bd = e A d B, Cd = C, Dd = D

0

called discretized system

Z h

e A d = A1 ehA I

0

eh1 , . . . , ehn

h

i < 0 e i < 1

for h > 0

extensions/variations:

(usually integer multiples of a common interval h)

Dual system

x = Ax + Bu, y = Cx + Du

is given by

z = AT z + C T v, w = B T z + DT v

role of B and C are swapped

Dual via block diagram

original system:

x(t)

u(t) B 1/s C y(t)

dual system:

DT

z(t)

w(t) BT 1/s CT v(t)

AT

Causality

interpretation of

Z t

tA

x(t) = e x(0) + e(t )ABu( ) d

0

Z t

tA

y(t) = Ce x(0) + Ce(t )ABu( ) d + Du(t)

0

for t 0:

current state (x(t)) and output (y(t)) depend on past input (u( ) for

t)

i.e., mapping from input to state and output is causal (with fixed initial

state)

Z t

(tT )A

x(t) = e x(T ) + e(t )ABu( ) d,

T

Idea of state

Change of coordinates

start with LDS x = Ax + Bu, y = Cx + Du

change coordinates in Rn to x

, with x = T x

then

= T 1x = T 1(Ax + Bu) = T 1AT x

x + T 1Bu

x + Bu,

= A

x y = C x

+ Du

where

A = T 1AT, = T 1B,

B C = CT, =D

D

C(sI 1B

A) +D

= C(sI A)1B + D

Standard forms for LDS

Jordan . . . )

T 1AT = diag(1, . . . , n)

write

bT

1

T 1B = .. ,

CT = c1 cn

bT

n

so n

i + bTi u,

X

i = ix

x y= cix

i

i=1

bT x

1

1 1/s c1

1

u y

bT x

n

n 1/s cn

(here we assume D = 0)

Discrete-time systems

discrete-time LDS:

x(t + 1) x(t)

u(t) B 1/z C y(t)

interpretation of z 1 block:

unit delayor (shifts sequence back in time one epoch)

latch (plus small delay to avoid race condition)

we have:

x(1) = Ax(0) + Bu(0),

= A2x(0) + ABu(0) + Bu(1),

t1

X

t

x(t) = A x(0) + A(t1 )Bu( )

=0

hence

y(t) = CAtx(0) + h u

where is discrete-time convolution and

D, t=0

h(t) = t1

CA B, t > 0

Z-transform

w : Z+ Rpq

X

W (z) = z tw(t)

t=0

Z-transform of time-advanced signal:

X

V (z) = z tw(t + 1)

t=0

X

= z z tw(t)

t=1

= zW (z) zw(0)

yields

hence

Y (z) = H(z)U (z) + C(zI A)1zx(0)

where H(z) = C(zI A)1B + D is the discrete-time transfer function

EE263 Autumn 2008-09 Stephen Boyd

Lecture 14

Example: Aircraft dynamics

linearized dynamics

steady-state analysis

impulse matrices

141

body axis

horizontal

state (components):

(down is positive)

(up is positive)

q = :

Inputs

disturbance inputs:

t: thrust

Linearized dynamics

u .003 .039 0 .322 u uw

v

= .065 .319 7.74 0

v vw

q .020 .101 .429 0 q

0 0 1 0

.01 1

.18 .04 e

+

1.16 .598 t

0 0

outputs of interest:

Steady-state analysis

DC gain from (uw , vw , e, t) to (u, h):

1 0 27.2 15.0

H(0) = CA 1

B+D =

0 1 1.34 24.9

gives steady-state change in speed & climb rate due to wind, elevator &

thrust changes

solve for control variables in terms of wind velocities, desired speed &

climb rate

e .0379 .0229 u uw

=

t .0020 .0413 h + vw

level flight, increase in speed is obtained mostly by increasing elevator

(i.e., downwards)

and increasing elevator (i.e., downwards)

eigenvalues are

eigenvectors are

0.0005 0.0135

j 0.8235 ,

0.5433

xshort =

0.0899 0.0677

0.0283 0.1140

0.7510 0.6130

j 0.0941

0.0962

xphug =

0.0111 0.0082

0.1225 0.1637

Short-period mode

1

0.5

u(t)

0.5

1

0 2 4 6 8 10 12 14 16 18 20

0.5

h(t)

0.5

1

0 2 4 6 8 10 12 14 16 18 20

period 7 sec, decays in 10 sec

Phugoid mode

2

u(t)

0

2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

1

h(t)

2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

period 100 sec; decays in 5000 sec

impulse response matrix from (uw , vw ) to (u, h)

wind bursts)

0.1 0.1

h12

h11

0 0

0.1 0.1

0 5 10 15 20 0 5 10 15 20

0.5 0.5

h22

h21

0 0

0.5 0.5

0 5 10 15 20 0 5 10 15 20

over time period [0, 600]:

0.1 0.1

h12

h11

0 0

0.1 0.1

0 200 400 600 0 200 400 600

0.5 0.5

h22

h21

0 0

0.5 0.5

0 200 400 600 0 200 400 600

impulse response matrix from (e, t) to (u, h)

2 2

1 1

h12

h11

0 0

1 1

2 2

0 5 10 15 20 0 5 10 15 20

5 3

2.5

2

h22

h21

0 1.5

0.5

5 0

0 5 10 15 20 0 5 10 15 20

over time period [0, 600]:

2 2

1 1

h12

h11

0 0

1 1

2 2

0 200 400 600 0 200 400 600

3 3

2 2

1 1

h22

h21

0 0

1 1

2 2

3 3

0 200 400 600 0 200 400 600

EE263 Autumn 2008-09 Stephen Boyd

Lecture 15

Symmetric matrices, quadratic forms, matrix

norm, and SVD

eigenvectors of symmetric matrices

quadratic forms

norm of a matrix

151

then n

X

T T T

v Av = v (Av) = v v = |vi|2

i=1

but also n

T T T X

v Av = (Av) v = (v) v = |vi|2

i=1

Eigenvectors of symmetric matrices

Aqi = iqi, qiT qj = ij

Q1AQ = QT AQ =

n

X

T

A = QQ = iqiqiT

i=1

replacements

Interpretations

A = QQT

x QT x QT x Ax

T Q

Q

scale coordinates by i

or, geometrically,

rotate by QT

rotate back by Q

decomposition

n

X

A= iqiqiT

i=1

expresses A as linear combination of 1-dimensional projections

example:

1/2 3/2

A =

3/2 1/2

T

1 1 1 1 0 1 1 1

=

2 1 1 0 2 2 1 1

q1

q2q2T x

x

q1q1T x 1q1q1T x

q2

2q2q2T x

Ax

proof (case of i distinct)

eigenvectors of A:

Avi = ivi, kvik = 1

then we have

so (i j )viT vj = 0

in general case (i not distinct) we must say: eigenvectors can be

chosen to be orthogonal

Example: RC circuit

i1

v1 c1

resistive circuit

in

vn cn

ck v k = ik , i = Gv

use state xi = civi, so

where C 1/2 = diag( c1, . . . , cn)

we conclude:

Quadratic forms

n

X

T

f (x) = x Ax = Aij xixj

i,j=1

xT Ax = xT ((A + AT )/2)x

A=B

Examples

kBxk2 = xT B T Bx

Pn1

i=1 (xi+1 xi)2

kF xk2 kGxk2

xT Ax = xT QQT x

= (QT x)T (QT x)

Xn

= i(qiT x)2

i=1

n

X

1 (qiT x)2

i=1

= 1kxk2

similar argument shows xT Ax nkxk2, so we have

nxT x xT Ax 1xT x

suppose A = AT Rnn

A 0 if and only if min(A) 0, i.e., all eigenvalues are nonnegative

not the same as Aij 0 for all i, j

denoted A > 0

A > 0 if and only if min(A) > 0, i.e., all eigenvalues are positive

Matrix inequalities

if B A > 0, etc.

for example:

if A B and C D, then A + C B + D

if B 0 then A + B A

if A 0 and 0, then A 0

A2 0

if A > 0, then A1 > 0

A 6 B, B 6 A

Ellipsoids

E = { x | xT Ax 1 }

s1

s2

1/2

semi-axes are given by si = i qi, i.e.:

note:

p

max/min gives maximum eccentricity

Gain of a matrix in a direction

direction x

questions:

(and corresponding maximum gain direction)?

(and corresponding minimum gain direction)?

Matrix norm

kAxk

max

x6=0 kxk

kAxk2 xT AT Ax

max = max = max(AT A)

x6=0 kxk 2 x6=0 kxk 2

p

so we have kAk = max(AT A)

q

min kAxk/kxk = min(AT A)

x6=0

note that

with max

min

1 2

example: A = 3 4

5 6

35 44

AT A =

44 56

T

0.620 0.785 90.7 0 0.620 0.785

=

0.785 0.620 0 0.265 0.785 0.620

p

then kAk = max(AT A) = 9.53:

2.18

0.620

A 0.620
=
4.99
= 9.53

0.785
= 1,

0.785
7.78

p

min gain is min(AT A) = 0.514:

0.46

0.785 0.785

= 0.14 = 0.514

0.620 = 1,

A

0.620 0.18

kAxk

0.514 9.53

kxk

consistent

p with vector

norm: matrix norm of a Rn1 is

max(aT a) = aT a

definiteness: kAk = 0 A = 0

Singular value decomposition

decomposition (SVD) of A:

A = U V T

where

A Rmn, Rank(A) = r

U Rmr , U T U = I

V Rnr , V T V = I

r

X

T

A = U V = iuiviT

i=1

AT A = (U V T )T (U V T ) = V 2V T

hence:

p

i = i(AT A) (and i(AT A) = 0 for i > r)

kAk = 1

similarly,

AAT = (U V T )(U V T )T = U 2U T

hence:

p

i = i(AAT ) (and i(AAT ) = 0 for i > r)

Interpretations

r

X

T

A = U V = iuiviT

i=1

x T

V Tx V T x Ax

V U

scale coefficients by i

reconstitute along output directions u1, . . . , ur

output directions are different

Av1 = 1u1

SVD gives clearer picture of gain as function of input/output directions

10) and come out mostly along plane spanned by u1, u2

10)

A is nonsingular

resolve x along v1, v2: v1T x = 0.5, v2T x = 0.6, i.e., x = 0.5v1 + 0.6v2

v2

u1

x Ax

v1 u2

EE263 Autumn 2008-09 Stephen Boyd

Lecture 16

SVD Applications

general pseudo-inverse

full SVD

SVD in estimation/inversion

161

General pseudo-inverse

if A 6= 0 has SVD A = U V T ,

A = V 1U T

if A is skinny and full rank,

A = (AT A)1AT

if A is fat and full rank,

A = AT (AAT )1

in general case:

w

minimum-norm, least-squares approximate solution

i.e., 1

x = AT A + I AT y

here, AT A + I > 0 and so is invertible

0

1

in fact, we have lim AT A + I A T = A

0

(check this!)

Full SVD

1 v1T

A = U11V1T = ... ..

u1 ur

r vrT

V = [V1 V2] Rnn are orthogonal

" #

1 0r(n r)

=

0(m r)r 0(m r)(n r)

then we have

" #

1 0r(n r) V1T

A = U11V1T =

U1 U2

0(m r)r 0(m r)(n r) V2T

i.e.:

A = U V T

Image of unit ball under linear transformation

full SVD:

A = U V T

rotate (by V T )

rotate (by U )

1 1

1 rotate by V T 1

u2 u1 rotate by U

0.5

2

SVD in estimation/inversion

suppose y = Ax + v, where

y Rm is measurement

x Rn is vector to be estimated

v is a measurement noise or error

nothing about v ( gives max norm of noise)

consider estimator x

= By, with BA = I (i.e., unbiased)

=x

Eunc = { Bv | kvk }

x

x=x

x

x

Eunc = x

+ Eunc, i.e.:

true x lies in uncertainty ellipsoid Eunc, centered at estimate x

semiaxes of Eunc are iui (singular values & vectors of B)

x xk kBk

Bls = A is the least-squares estimator

then:

BlsBlsT BB T

Els E

in particular kBlsk kBk

we have

k1T

k2T

y =

k3T x

k4T

where ki R2

1

k1T

=

x 022 y

k2T

= A y

x

uncertainty regions (with = 1):

20

15

x2

10

0

10 8 6 4 2 0 2 4 6 8 10

x1

uncertainty region for x using least-squares

20

15

x2

10

0

10 8 6 4 2 0 2 4 6 8 10

x1

suppose A Rmn, m > n, is full rank

SVD: A = U V T , with V orthogonal

Bls = A = V 1U T , and B satisfies BA = I

define Z = B Bls, so B = Bls + Z

then ZA = ZU V T = 0, so ZU = 0 (multiply by V 1 on right)

therefore

= BlsBlsT + BlsZ T + ZBlsT + ZZ T

= BlsBlsT + ZZ T

BlsBlsT

Sensitivity of linear equations to data error

if kA1k is large,

cant solve for x given y (with small errors)

hence, A can be considered singular in practice

kxk kyk

kAkkA1k

kxk kyk

we have:

we say

nonsquare, = max(A)/min(A)

r

X

mn T

suppose A R , Rank(A) = r, with SVD A = U V = iuiviT

i=1

Rank(A)

we seek matrix A, p < r, s.t. A A in the sense that

is minimized

kA Ak

p

A =

X

iuiviT

i=1

P

r T

hence kA Ak =
i=p+1 iuivi
= p+1

take p to get rank p approximant

proof: suppose Rank(B) p

hence, the two subspaces intersect, i.e., there is a unit vector z Rn s.t.

Bz = 0, z span{v1, . . . , vp+1}

p+1

X

(A B)z = Az = iuiviT z

i=1

p+1

X

2

k(A B)zk = i2(viT z)2 p+1

2

kzk2

i=1

hence kA Bk p+1 = kA Ak

Distance to singularity

another interpretation of i:

i = min{ kA Bk | Rank(B) i 1 }

matrix

application: model simplification

suppose y = Ax + v, where

then the terms iuiviT x, for i = 5, . . . , 30, are substantially smaller than

the noise term v

simplified model:

4

X

y= iuiviT x + v

i=1

EE263 Autumn 2008-09 Stephen Boyd

Lecture 17

Example: Quantum mechanics

discretization

preservation of probability

example

171

Quantum mechanics

potential V : [0, 1] R

interpretation: |(x, t)|2 is probability density of particle at position x,

time t

Z 1

(so |(x, t)|2 dx = 1 for all t)

0

!

2

i

h = V h

2 = H

2m x

where H is Hamiltonian operator, i = 1

Discretization

lets discretize position x into N discrete points, k/N , k = 1, . . . , N

wave function is approximated as vector (t) CN

2x operator is approximated as matrix

2 1

1 2 1

2 2

=N 1 2 1

... ... ...

1 2

so w = 2v means

(vk+1 vk )/(1/N ) (vk vk1)(1/N )

wk =

1/N

= (i/

h/2m)2) = (i/

h)(V ( h)H

vectors & matrices):

= (i/

h)H

(backward if t < 0)

Preservation of probability

d d

kk2 =

dt dt

+

=

h)H) + ((i/

= ((i/ h)H)

h)H + (i/

= (i/ h)H

= 0

(using H = H T RN N )

unitary and z CN , then

Eigenvalues & eigenstates

H is symmetric, so

from Hv = v (i/

h)Hv = (i/

h)v we see:

eigenvectors of (i/

h)H are same as eigenvectors of H, i.e., v1, . . . , vN

eigenvalues of (i/

h)H are (i/

h)1, . . . , (i/

h)N (which are pure

imaginary)

(i/h)k t 2

2

|m(t)| = e vk = |vmk |2

Example

Potential Function V (x)

1000

900

800

700

600

V

500

400

300

200

100

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(for this example, we set

h = 1, m = 1 . . . )

0.2

0

0.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2

0

0.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2

0

0.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2

0

0.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x

four eigenstates with lowest energy shown (i.e., v1, v2, v3, v4)

now lets look at a trajectory of , with initial wave function (0)

0.08

0.06

0.04

0.02

0

0 0.1 0.2 0.3 0.4

x

0.5 0.6 0.7 0.8 0.9 1

0.15

0.1

0.05

0

0 10 20 30 40 50 60 70 80 90 100

eigenstate

top plot shows initial probability density |(0)|2

bottom plot shows |vk(0)|2, i.e., resolution of (0) into eigenstates

time evolution, for t = 0, 40, 80, . . . , 320:

|(t)|2

particle rolls half way up potential bump, stops, then rolls back down

(perfectly elastic collision)

then repeats

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

4

t x 10

N/2

X

plot shows probability that particle is in left half of well, i.e., |k (t)|2,

k=1

versus time t

EE263 Autumn 2008-09 Stephen Boyd

Lecture 18

Controllability and state transfer

state transfer

181

State transfer

[ti, tf ]

we say input u : [ti, tf ] Rm steers or transfers state from x(ti) to x(tf )

(over time interval [ti, tf ])

(subscripts stand for initial and final)

questions:

Reachability

Z t

Rt = e(t )ABu( ) d u : [0, t] Rm

0

( )

t1

X

At1 Bu( ) u(t) Rm

Rt =

=0

Rt is a subspace of Rn

Rt Rs if t s

(i.e., can reach more points given more time)

we define the reachable set R as the set of points reachable for some t:

[

R= Rt

t0

Reachability for discrete-time LDS

u(t 1)

x(t) = Ct ..

u(0)

where Ct = B AB At1B

of A0, . . . , An1

thus we have

range(Ct) t < n

Rt =

range(C) t n

where C = Cn is called the controllability matrix

Controllable system

R = Rn)

0 1 1

example: x(t + 1) = x(t) + u(t)

1 0 1

1 1

controllability matrix is C =

1 1

R = range(C) = { x | x1 = x2 }

u(tf 1)

x(tf ) = Atf ti x(ti) + Ctf ti ..

u(ti)

if system is controllable any state transfer can be achieved in n steps

important special case: driving state to zero (sometimes called

regulating or controlling state)

Least-norm input for reachability

u(t 1)

xdes = Ct ..

u(0)

among all u that steer x(0) = 0 to x(t) = xdes, the one that minimizes

t1

X

ku( )k2

=0

is given by

uln(t 1)

.. = CtT (CtCtT )1xdes

uln(0)

uln is called least-norm or minimum energy input that effects state transfer

can express as

t1

!1

X

uln( ) = B T (AT )(t1 ) AsBB T (AT )s xdes,

s=0

for = 0, . . . , t 1

t1

X

Emin, the minimum value of ku( )k2 required to reach x(t) = xdes, is

=0

sometimes called minimum energy required to reach x(t) = xdes

t1

X

Emin = kuln( )k2

=0

T

= CtT (CtCtT )1xdes CtT (CtCtT )1xdes

= xTdes(CtCtT )1xdes

t1

!1

X

= xTdes A BB T (AT ) xdes

=0

x(0) = 0 (i.e., how large a u is required)

function of xdes, t)

with one unit of energy

(shows directions that can be reached with small inputs, and directions

that can be reached only with large inputs)

Emin as function of t:

if t s then

t1

X s1

X

T T

A BB (A ) A BB T (AT )

=0 =0

hence !1 !1

t1

X s1

X

A BB T (AT ) A BB T (AT )

=0 =0

so Emin(xdes, t) Emin(xdes, s)

1.75 0.8 1

example: x(t + 1) = x(t) + u(t)

0.95 0 0

10

6

Emin

0

0 5 10 15 20 25 30 35

ellipsoids Emin 1 for t = 3 and t = 10:

10

Emin(x, 3) 1

x2

0

10

10 8 6 4 2

x10 2 4 6 8 10

10

Emin(x, 10) 1

5

x2

10

10 8 6 4 2

x10 2 4 6 8 10

the matrix !1

t1

X

P = lim A BB T (AT )

t

=0

always exists, and gives the minimum energy required to reach a point xdes

(with no limit on t):

( )

t1

X

ku( )k2 x(0) = 0, x(t) = xdes = xTdesP xdes

min

=0

P z = 0, z 6= 0 means can get to z using us with energy as small as you

like

(u just gives a little kick to the state; the instability carries it out to z

efficiently)

Continuous-time reachability

consider now x = Ax + Bu with x(t) Rn

reachable set at time t is

Z t

m

Rt = e(t )ABu( ) d u : [0, t] R

0

C = B AB An1B

as you like (with large enough u)

first lets show for any u (and x(0) = 0) we have x(t) range(C)

t t2

etA = I + A + A2 +

1! 2!

by C-H, express An, An+1, . . . in terms of A0, . . . , An1 and collect powers

of A:

etA = 0(t)I + 1(t)A + + n1(t)An1

therefore

Z t

x(t) = e ABu(t ) d

0

Z n1

!

t X

= i( )Ai Bu(t ) d

0 i=0

n1

X Z t

i

= AB i( )u(t ) d

i=0 0

= Cz

Z t

where zi = i( )u(t ) d

0

Impulsive inputs

suppose x(0) = 0 and we apply input u(t) = (k)(t)f , where (k) denotes

kth derivative of and f Rm

then U (s) = sk f , so

= s1I + s2A + Bsk f

sAk2 + Ak1} +s1Ak + )Bf

= ( |sk1 + + {z

impulsive terms

hence

k k+1 t k+2 t2

x(t) = impulsive terms + A Bf + A Bf + A Bf +

1! 2!

in particular, x(0+) = Ak Bf

where fi Rm

by linearity we have

f0

x(0+) = Bf0 + + An1Bfn1 = C ..

fn1

can also be shown that any point in range(C) can be reached for any t > 0

using nonimpulsive inputs

fact: if x(0) R, then x(t) R for all t (no matter what u is)

Example

input is tension between masses

state is x = [y T y T ]T

u(t) u(t)

system is

0 0 1 0 0

0 0 0 1

0

x =

1 x+ u

1 1 1 1

1 1 1 1 1

if not, where can we maneuver state?

controllability matrix is

0 1 2 2

0 1 2 2

C= B AB A2B A3B =

1 2

2 0

1 2 2 0

1 0

1 , 0

R = span 0 1

0 1

differential motions

its obvious internal force does not affect center of mass position or

total momentum!

Z t

ku( )k2 d

0

so

u d (N 1)

x(t) = Bd AdBd AN 1

B

..

d d

ud(0)

where Z h

hA

Ad = e , Bd = e A d B

0

N 1

!1

X

udln(k) = BdT (ATd )(N 1k) AidBdBdT (ATd )i xdes

i=0

T

BdT (ATd )(N 1k) = BdT e(t )A

T

(t/N )B T e(t )A

similarly

N

X 1 N

X 1

T

AidBdBdT (ATd )i = e(ti/N )ABdBdT e(ti/N )A

i=0 i=0

Z t

T

(t/N ) etABB T etA dt

0

for large N

hence least-norm discretized input is approximately

Z t 1

T (t )AT tA T tAT

uln( ) = B e e BB e dt xdes, 0 t

0

for large N

min energy is Z t

kuln( )k2 d = xTdesQ(t)1xdes

0

where Z t

T

Q(t) = e ABB T e A d

0

can show

Q(s) > 0 for some s > 0

Minimum energy over infinite horizon

the matrix Z 1

t

T

P = lim e ABB T e A d

t 0

always exists, and gives minimum energy required to reach a point xdes

(with no limit on t):

Z t

min ku( )k2 d x(0) = 0, x(t) = xdes = xTdesP xdes

0

if A is not stable, then P can have nonzero nullspace

P z = 0, z 6= 0 means can get to z using us with energy as small as you

like (u just gives a little kick to the state; the instability carries it out to

z efficiently)

since Z tf

(tf ti )A

x(tf ) = e x(ti) + e(tf )ABu( ) d

ti

in zero time with impulsive inputs

in any positive time with non-impulsive inputs

Example

u1 u1 u2 u2

y

x=

y

system is:

0 0 0 1 0 0 0 0

0 0 0 0 1 0

0 0

0 0 0 0 0 1 0 0 u1

x = x +

2 1 0 2 1 0

1 0 u2

1 2 1 1 2 1 1 1

0 1 2 0 1 2 0 1

steer state from x(0) = e1 to x(tf ) = 0

Z tf

Emin = kuln( )k2 d vs. tf :

0

50

45

40

35

Emin30

25

20

15

10

0

0 2 4 6 8 10 12

tf

5

u1(t)

5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

t

5

u2(t)

5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

and for tf = 4:

5

u1(t)

0

5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

t

5

u2(t)

5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

output y1 for u = 0:

0.8

0.6

y1(t)

0.4

0.2

0.2

0.4

0 2 4 6 8 10 12 14

output y1 for u = uln with tf = 3:

0.8

0.6

y1(t)

0.4

0.2

0.2

0.4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

0.8

0.6

y1(t)

0.4

0.2

0.2

0.4

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

EE263 Autumn 2008-09 Stephen Boyd

Lecture 19

Observability and state estimation

state estimation

discrete-time observability

observability controllability duality

observers for noiseless case

continuous-time observability

least-squares observers

example

191

v is sensor noise or error

A, B, C, and D are known

u and y are observed over time interval [0, t 1]

w and v are not known, but can be described statistically, or assumed

small (e.g., in RMS value)

State estimation problem

s = t 1: estimate current state

s = t: estimate (i.e., predict) next state

(s) is called an observer or

state estimator

(s) is denoted x

x

x(s) given t 1)

(read,

Noiseless case

then we have

y(0) u(0)

.. = Otx(0) + Tt ..

y(t 1) u(t 1)

where

C D 0

CA CB D 0

Ot =

.. ,

Tt =

..

CAt1 CAt2B CAt3B CB D

hence we have

y(0) u(0)

Otx(0) = .. Tt ..

y(t 1) u(t 1)

hence:

its effect can be subtracted out

Observability matrix

hence for t n, N (Ot) = N (O) where

C

CA

O = On =

..

n1

CA

if x(0) can be deduced from u and y over [0, t 1] for any t, then x(0)

can be deduced from u and y over [0, n 1]

N (O) is called unobservable subspace; describes ambiguity in determining

state from input and output

system is called observable if N (O) = {0}, i.e., Rank(O) = n

B,

let (A, C,

D)

be dual of system (A, B, C, D), i.e.,

A = AT , = CT ,

B C = B T , = DT

D

C = [B

AB

An1B]

= [C T AT C T (AT )n1C T ]

= OT ,

= CT

similarly we have O

thus, system is observable (controllable) if and only if dual system is

controllable (observable)

in fact,

N (O) = range(OT ) = range(C)

i.e., unobservable subspace is orthogonal complement of controllable

subspace of dual

inverse of Ot, i.e., F Ot = I

y(0) u(0)

x(0) = F .. Tt ..

y(t 1) u(t 1)

in fact we have

y( t + 1) u( t + 1)

x( t + 1) = F .. Tt ..

y( ) u( )

i.e., our observer estimates what state was t 1 epochs ago, given past

t 1 inputs & outputs

with inputs u and y, and output x

then Az N (O)

n1

X

n1 n

CA (Az) = CA z = iCAiz = 0

i=0

det(sI A) = sn + n1sn1 + + 0

Continuous-time observability

x = Ax + Bu, y = Cx + Du

y = Cx + Du

y = C x + Du = CAx + CBu + Du

y = CA2x + CABu + CB u + D

u

and so on

hence we have

y u

y

= Ox + T u

.. ..

y (n1) u(n1)

where O is the observability matrix and

D 0

CB D 0

T =

..

CAn2B CAn3B CB D

rewrite as

y u

y u

Ox =

.. T

..

y (n1) u(n1)

hence if N (O) = {0} we can deduce x(t) from derivatives of u(t), y(t) up

to order n 1

y u

y

T u

x=F

.. ..

y (n1) u(n1)

impulsive inputs

A converse

x, y the corresponding state and output, i.e.,

x = Ax + Bu, y = Cx + Du

= x + etAz satisfies

then state trajectory x

= A

x x + Bu, y = Cx

+ Du

hence if system is unobservable, no signal processing of any kind applied to

u and y can deduce x

unobservable subspace N (O) gives fundamental ambiguity in deducing x

from u, y

Least-squares observers

y(0) u(0)

(0) = Ot

x .. Tt ..

y(t 1) u(t 1)

1

where Ot = OtT Ot OtT

ls(0) minimizes discrepancy between

interpretation: x

output y that would be observed, with input u and initial state x(0)

(and no sensor noise), and

t1

X

measured as y ( ) y( )k2

k

=0

t1

!1 t1

X X

ls(0) =

x (AT ) C T CA (AT ) C T y( )

=0 =0

y = y h u where h is impulse response

v(0)

ls(0) x(0) = Ot

(0) = x

x ..

v(t 1)

where x

(0) is the estimation error of the initial state

in particular, x

ls(0) = x(0) if sensor noise is zero

(i.e., observer recovers exact state in noiseless case)

t1

1X

kv( )k2 2

t =0

set of possible estimation errors is ellipsoid

v(0) t1

1X

(0) Eunc

x = O .

. 2

kv( )k 2

t t

v(t 1) =0

t1

!1

1 X

OtT Ot = (AT ) C T CA

=0

xls(0) x(0)k tkOtk

k

the matrix !1

t1

X

P = lim (AT ) C T CA

t

=0

always exists, and gives the limiting uncertainty in estimating x(0) from u,

y over longer and longer periods:

if A is stable, P > 0

i.e., cant estimate initial state perfectly even with infinite number of

measurements u(t), y(t), t = 0, . . . (since memory of x(0) fades . . . )

if A is not stable, then P can have nonzero nullspace

i.e., initial state estimation error gets arbitrarily small (at least in some

directions) as more and more of signals u and y are observed

Example

(linear, noisy) range measurements from directions 15, 0, 20, 30,

once per second

range noises IID N (0, 1); can assume RMS value of v is not much more

than 2

no assumptions about initial position & velocity

particle

range sensors

1 0 1 0

k1T

0 1 0 1

x(t + 1) =

0

x(t), y(t) = .. x(t) + v(t)

0 1 0

k4T

0 0 0 1

range measurements (& noiseless versions):

5

range 1

5

0 20 40 60 80 100 120

(0|t)

p

x1(0|t) x1(0))2 + (

( x2(0|t) x2(0))2

RMS position error

1.5

0.5

0

10 20 30 40 50 60 70 80 90 100 110 120

0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 110 120

least-squares estimate of initial state x(0), given u( ), y( ), 0 t:

choose xls(0) to minimize integral square residual

Z t

y( ) Ce Ax(0) 2 d

J=

0

lets expand as J = x(0)T Qx(0) + 2rT x(0) + s,

Z t Z t

AT T A T

Q= e C Ce d, r= e A C T y( ) d,

0 0

Z t

q= y( )T y( ) d

0

setting x(0)J to zero, we obtain the least-squares observer

Z t 1 Z t

AT T A T

ls(0) = Q

x 1

r= e C Ce d eA C T y( ) d

0 0

estimation error is

Z t 1 Z t

AT T A T

x ls(0) x(0) =

(0) = x e C Ce d e A C T v( ) d

0 0

therefore if v = 0 then x

ls(0) = x(0)

EE263 Autumn 2008-09 Stephen Boyd

Lecture 20

Some parting thoughts . . .

linear algebra

levels of understanding

whats next?

201

Linear algebra

comes up in many practical contexts (EE, ME, CE, AA, OR, Econ, . . . )

cf. 10 yrs ago (when it was mostly talked about)

500 1000 vbles: easy with general purpose codes

much more possible with special structure, special codes (e.g., sparse,

convolution, banded, . . . )

Levels of understanding

Platonic view:

Quantitative view:

must have understanding at one level before moving to next

Whats next?

(plus lots of other EE, CS, ICME, MS&E, Stat, ME, AA courses on signal

processing, control, graphics & vision, adaptive systems, machine learning,

computational geometry, numerical linear algebra, . . . )

- Depth From GradientDiunggah olehSamples Gamni
- Mathematics - Mathematical Economics and FinanceDiunggah olehOsaz Aiho
- David PoolDiunggah olehAyush Narayan
- EconometricsDiunggah olehGiovanniValente
- Lecture Notes on orthogonal polynomials of several variables by Dr Yaun Xu.pdfDiunggah olehhumejias
- Math 1185: Honors Linear AlgebraDiunggah olehThe Pittiful News
- VectorsDiunggah olehkim
- I.L. Kantor, A.S. Solodovnikov-Hypercomplex Numbers_ an Elementary Introduction to Algebras (1989)Diunggah olehFidelis Castro
- Oscillator Phase Noise a Geometrical ApproachDiunggah olehabin444
- Mid Term Paper 2 Revision (PDF)Diunggah olehOlzen Khaw Tee Seng
- MathSem3Diunggah olehPooja
- LinearAlgebraStudyGuide.pdfDiunggah olehPaul George
- 3 Dimension&BasisDiunggah olehalienxx
- Class Notes on Linear AlgebraDiunggah olehKartikeya Gupta
- Pose 25 LinesDiunggah olehLeiser Hartbeck
- True of False for Vector SpacesDiunggah olehShawn Joshua Yap
- Review of Basic Linear AlgebraDiunggah olehmnbvqwerty
- Hw 8 SolutionsDiunggah olehCJ Jacobs
- solutions5Diunggah olehAsilah
- IAS Mains Mathematics 2013Diunggah olehnikhilgocrazy
- Lec1 Maths[1]Diunggah olehrajpd28
- Chapter3-20171011Diunggah olehNishant Naidu
- dualspace_110sp14Diunggah olehJaviera San Martín
- VS6-Basis and DimensionDiunggah olehVan Minh
- MathNYCSc_FinalExam_May02Diunggah olehjwong_teacher
- Lecture 14Diunggah olehMohsin Jutt
- Lab solution 09Diunggah olehhideto290
- Phase DiagramDiunggah olehSara Malvar
- Interconnection of Kronecker Canonical Form and SCB of Multi Variable Linear SystemsDiunggah olehanantsrao
- Statics RevisionDiunggah olehelkably

- MAT2002 Applications of Differential and Difference Equations ETH 1 AC37Diunggah olehNikhilesh Prabhakar
- Topic 1 Matrix EditedDiunggah olehThinesh Subramani
- Matrices5.pdfDiunggah olehVictor Hugo Maya Sandoval
- Data Structures and Algorithms AssignmentDiunggah olehVibhu Mittal
- Chapter wise Test papers for Class XI-Physics ZIET- BHUBANESWAR.pdfDiunggah olehAdityaSingh
- MATLAB.docxDiunggah olehNeymar
- FORWARD KINEMATICS1.docxDiunggah olehadirey
- Chapter 10Diunggah olehricha
- Fault Analysis Using z BusDiunggah olehJagath Prasanga
- Geometry ModDiunggah olehDeepika Reddy
- GATE:linear algebra SAMPLE QUESTIONSDiunggah olehs_subbulakshmi
- VectorsDiunggah olehMohamed Fouad Korra
- Ch3-mfilesDiunggah olehHaroon Hussain
- 9 5 Lines and PlanesDiunggah olehEbookcraze
- MCQs_Ch_7_FSC_part2_Nauman.pdfDiunggah olehMuhammad Mohsin Raza
- Multiplication of VectorsDiunggah olehchashma
- (Rosenberger, 1997) - Functional Analysis Introduction to Spectral Theory in Hilbert SpacesDiunggah olehclearcasting
- Fundamentals of Continuum Mechanics RudnickiDiunggah olehMateus
- PMU Lecture 6_The Impedance Model and ZbusDiunggah olehPoor Adam
- Math2000 FormulasDiunggah olehJack
- Elementary Linear Algebra MathewDiunggah olehAnup Saravan
- Equilibrium Conditions of Class 1 Tensegrity Structures by Williamson, Skelton, HanDiunggah olehTensegrity Wiki
- Operations on Matrices - Linear AlgebraDiunggah olehdocsdownforfree
- Camera Class for OpenGLDiunggah olehwatwat99
- Complex VariablesDiunggah olehAli Khan
- Cosmological PerturbationsDiunggah olehGuillermo Eaton
- ROBUST VISUAL TRACKING BASED ON SPARSE PCA-L1Diunggah olehCS & IT
- Properties of Laplace transformDiunggah olehstudev1024
- ee2ma-vcDiunggah olehJasmy Paul
- 1312.0569Diunggah olehMasudul IC