Anda di halaman 1dari 12

Department

of Engineering Sciences and Mathematics


February 2, 2011
Ove Edlund
C0002M - Numerical Analysis

Some Notes on Least Squares, QR-factorization,


SVD and Fitting

Contents
1

Introduction

The Least squares problem

Finding the least squares solution with the normal equations

The QR factorization and least squares

Calculating the QR-factorization

SVD and tting


10
6.1 The Singular Value Decomposition . . . . . . . . . . . . . . . . . 10
6.2 The Rigid Body Movement Problem . . . . . . . . . . . . . . . . . 10
6.3 Fitting Planes and Lines by Orthogonal Distance Regression . . . . 11

Introduction

While Chapra (2008) makes a good job in general, the book lacks in the linear algebra
part. Doing LU-factorization without a permutation matrix is a bad idea. Cholesky
factorization of a symmetric matrix is only possible if the matrix is positive denite.
Those particular problems are handled in the slides of C0002M. When it comes
to least squares though, I would like the course to go deeper than the book with a
slightly different focus. This document serves as a complement in that respect and
is part of the course literature.

1 (12)

12

10

8
y
6

Figure 1: The data points and the model function y = a1 + a2 x + a3 /(x + 1) for different
choices of a1 , a2 and a3 .

The Least squares problem

The least squares solution of a system of linear equations is dened as follows


Denition 1 Given a matrix Z Rnp and a right hand side y Rn , the least squares
solution a Rp is given by
min y Z a2 .
a

What we are trying to do is to nd some kind of solution to Z a = y when the equality


is not possible to fulll. So we minimize the size of the difference between the left
hand and the right hand side. The size that we minimize is found with the l2 -norm.
Other norms could be used, but then we would not nd the least squares solution.
Other choices also require a much greater computational effort.
A typical situation where a least squares problem shows up is when a curve is
tted to some data points, i.e. when we do regression. For instance, consider the
data points
x 0 0.5 1 2 5
y 1 3 3 5 11
to which we are trying to t a model function with unknown parameters a1 , a2 and
a3 .
1
.
y = a1 + a2 x + a3
x+1
Figure 1 shows the data points and the model function for some different choices of
the unknown parameters. We want to nd the choice of parameters that give the
best t of the model function to the data points. To accomplish this, we plug the

2 (12)

data points into the model function and receive a system of linear equations:

1 0
1 = a1 + a2 0 + a3 /(0 + 1)

3
=
a
+
a

0.5
+
a
/(0.5
+
1)

1
2
3
1 0.5
3 = a1 + a2 1 + a3 /(1 + 1)

1 1

1 2
5
=
a
+
a

2
+
a
/(2
+
1)

1
2
3

11 = a1 + a2 5 + a3 /(5 + 1)
1 5


1
1

1/1.5
a
1
3

1/2 a2 = 3

5
1/3 a3
11
1/6

If we could solve this system, we would get a function curve that passed all the data
points. It is however not possible to solve the system, so we do the second best, we
nd the least squares solution to minimize the difference between the left hand and
the right hand side. The following command sequence in Matlab nds our solution
and makes a plot of it as shown in Figure 2.
>> x = [0 0.5 1 2 5]'
x =
0
0.5000
1.0000
2.0000
5.0000
>> y = [1 3 3 5 11]'
y =
1
3
3
5
11
>> Z = [x.^0 x 1./(x+1)]
Z =
1.0000
0
1.0000
1.0000
0.5000
0.6667
1.0000
1.0000
0.5000
1.0000
2.0000
0.3333
1.0000
5.0000
0.1667
>> a = Z\y
a =
1.5772
1.8792
-0.3221
>> xs = -0.2:0.01:5.2;
>> plot(xs,a(1)+a(2)*xs+a(3)./(xs+1),x,y,'o')
>> xlabel('x'), ylabel('y')

As seen in Figure 2 the least squares solution ts nicely to the general behavior of the
data points, without passing through them. Also note that the least squares solution
is found easily with \ in Matlab.
The success of all this is due to that the unknowns a1 , a2 and a3 are weights in a
linear combination of basis functions. This is also true for tting polynomials for
example. With this in mind we can express a general model

y = a1 1 (x) + a2 2 (x) + + ap p (x)


3 (12)

(1)

12

10

0
1

Figure 2: The least squares solution

with known basis functions k (x). In the previous example we had 1 (x) = 1,
2 (x) = x and 3 (x) = 1/(x + 1). Inserting the data points (xi , yi ) into the model
generates the system


1 (x1 ) 2 (x1 ) . . . p (x1 )
a1
y1
1 (x2 ) 2 (x2 ) . . . p (x2 ) a2 y2


..
.. = ..
..
..
.
. .
.
.
1 (xn ) 2 (xn ) . . . p (xn )
ap
yn
which is solved in least square sense to nd the weights ak . If the model function does
not have the form (1) it may still be possible to transform it to the right shape. This
is the case for the model functions y = a1 ea2 t and y = a1 ta2 , where the form (1) is
realized by taking the logarithm of the expressions.
In other cases, for example with y = a1 sin(a2 t) + a3 , there is no cure, and more
computationally demanding methods are necessary to nd the least squares solution.
There is a Matlab command lsqcurvet to handle those non-linear cases.

Finding the least squares solution with the normal equations

There are several methods for nding the least squares solution. The one appearing
most frequently in Linear algebra textbooks are the normal equations.
Theorem 1 (The Normal equations) Given the matrix Z Rnp and the right hand
side y Rn , the solution set of the least squares problem
min y Z a2

aRp

is identical to the solution set of the normal equations


ZT Z a = ZT y .
4 (12)

Proof See e.g. Lay (2003).


So nding the least squares solution simply reduces to solving a system of linear
equations with system matrix ZT Z and right hand side ZT y. If Z has full column
rank it holds that ZT Z is
(
)T
( )T
symmetric : ZT Z = ZT ZT = ZT Z, and
positive denite : Let q = Z x. Then
xT ZT Z x = (Z x)T Z x = qT q = q22 > 0, if q = 0 x = 0
With ZT Z as a symmetric and positive denite matrix, obviously it is favorable to
use a Cholesky factorization to solve the normal equations. Continuing with the
previous example, we thus nd the least squares solution with the normal equations
in this way in Matlab:
>> Matrix = Z'*Z
Matrix =
5.0000
8.5000
8.5000
30.2500
2.6667
2.3333
>> rhs = Z'*y
rhs =
23.0000
69.5000
8.0000
>> U = chol(Matrix)
U =
2.2361
3.8013
0
3.9749
0
0
>> a = U\(U'\rhs)
a =
1.5772
1.8792
-0.3221

2.6667
2.3333
1.8333

1.1926
-0.5535
0.3237

The restriction with full column rank makes sense from a numerical perspective,
since the numerical algorithms we use depend on the existence of a unique solution.
It is however possible to handle rank-decient problems gracefully using the singular
value decomposition (SVD), though we use it for other purposes in these notes.
One can show that the condition number
/
Z x2
Z x 2
min
Cond(Z) = max
x=0 x2
x=0 x2
describes the relative perturbation sensitivity in the least squares solution a, when
e = y Z a, the residual, is small. Moreover

(
)2
Cond(ZT Z) = Cond(Z)
5 (12)

so, the normal eqations have a relative sensitivity that is squared compared to the
original problem formulation. From this we draw the conclusion that any numerical
method using the normal equations will be unstable, since the rounding errors will
(
)2
correspond to Cond(Z) , while the mathematical problem has condition number Cond(Z). This bad numerical behaviour for the normal equations indeed does
happen when we use the Cholesky factorization. We need better options!

The QR factorization and least squares

We get a stable numerical algorithm, by using the QR-factorization. To understand


the factorization we need to recapture some special classes of matrices and some of
their properties.
Denition 2 (Orthonormal columns) A matrix Q Rnp with p n, has orthonormal columns if all columns in Q are orthogonal to every other column, and are normalized.
More specicly, for
Q = [ q 1 q2 . . . qp ]
where qi are the columns of Q , it holds that qTi qk = 0 for all i = k , and qi 2 = 1 for all i.
The following properties hold for matrices Q Rnp with orthonormal columns:
QT Q = Ip as a consequence of the denition.
Q QT is the transformation that does orthogonal projection from Rn onto
Col(Q) (Lay 2003).
The point being made here is that Q QT is not the identity matrix. If we let p = n
the situation changes though:
Denition 3 (Orthogonal matrix) A square matrix with orthonormal columns is referred
to as an orthogonal matrix. Thus if Q has orthonormal columns and Q Rnn , then Q is an
orthogonal matrix.
For orthogonal matrices it holds that QT Q = Q QT = In . (The last part of the
equality is true since Col(Q) = Rn and the orthogonal projection from Rn onto Rn
is the unit transformation In .) From this it also is clear that if Q is an orthogonal
matrix, then QT is orthogonal as well. It is also obvious that Q1 = QT , a property
that only holds for orthogonal matrices.
Now that we have consulted our memory banks, we are ready for the QRfactorization.
Theorem 2 (QR-factorization) Any matrix Z Rnp with p n can be factorized as
Z = QR
where Q Rnp has orthonormal columns, and R Rpp is upper triangular.
The proof is by construction. It is always possible to construct this factorization by
the Grahm-Schmidt orthogonalization procedure, which is bad to use with oating point numbers, or by Householder reections, which is good (Demmel 1997,
Higham 1996). We will take a look at Householder reections in Section 5.
We use this factorization for the least squares problem instead of the normal
equations, as stated in the following theorem:
6 (12)

Theorem 3 (QR-factorization and least squares) Given the matrix Z Rnp


and the right hand side y Rn , the solution set of the least squares problem
min y Z a2

aRp

is identical to the solution set of


R a = QT y
where the matrix Q Rnp with orthonormal columns and the upper triangular R Rpp , is
a QR-factorization of Z.
So we get a system matrix that is upper triangular. We easily solve this system with
backwards substitution. Yet again, we go to our example and do this in Matlab:
>> [Q,R] = qr(Z,0)
Q =
-0.4472
-0.4277
-0.4472
-0.3019
-0.4472
-0.1761
-0.4472
0.0755
-0.4472
0.8302
R =
-2.2361
-3.8013
0
3.9749
0
0
>> rhs = Q'*y
rhs =
-10.2859
7.6480
-0.1043
>> a = R\rhs
a =
1.5772
1.8792
-0.3221

0.7104
-0.1043
-0.4041
-0.4888
0.2868
-1.1926
-0.5535
0.3237

Obviously this works as stated in the theorem. We get the same solution as before.
Note that, to get the QR-factorization described in this document, that we are interested in for solving least squares problems, we have to add an extra 0 argument to
qr(Z,0).
In order to prove Theorem 3, we rst need to establish a property of the l2 -norm.
Lemma 1 Multiplying a vector with an orthogonal matrix does not change its l2 -norm. Thus if
Q is orthogonal it holds that
Q x2 = x2
Proof Q x22 = (Q x)T Q x = xT QT Q x = xT x = x22
Using this lemma we prove Theorem 3

7 (12)

Proof of Theorem 3 Given the QR-factorization Z = QR we do an orthogonal


extension of Q, i.e. we add additional orthonormal columns to it to get an orthogonal
b ] Rnn . The columns of Q
b represent the added orthonormal
matrix: [ Q Q
T
b Q = 0. Also note that [ Q Q
b ]T also is an orthogonal matrix.
columns. Note that Q
Now

2

b ]T (y Q R a)
y Z a22 = y Q R a22 = [ Q Q

2

[ T ]
2 [ T
]
Q (y Q R a) 2
Q




=
Q
b T (y Q R a)
b T (y Q R a) = Q
2

[
[ T
]
]
Q y QT Q R a 2 QT y R a 2




= bT

bTy
bTQRa =
Q
Q yQ
2
2

b T y22 Q
b T y22
= QT y R a22 + Q
b T y22 , and this minimum is realized when
So the minimum value of y Z a22 is Q
QT y R a22 = 0, i.e. when QT y R a = 0.

Calculating the QR-factorization

This section gives a rough sketch on how to calculate the QR-factorization in a way
that is numerically stable.
The Householder reection matrix
H=I2

v vT
,
vT v

H Rnn

is orthogonal (HT H = H HT = I), and symmetric (HT = H) for any vector v Rn .


Now given the matrix Z Rnp with rst column z1 and by choosing

z11 z1 2

z21

z31
v=

..

.
zn1
the Householder reection H, applied on Z will zero out the rst column, except the
rst entry. The rst column of H Z will be

z1 2
0

..

.
0
Just as in Gauss elimination, we have translated everything below the diagonal to
zero. We continue in the same way with the second, third etc column, but regard
only the entries on the diagonal and below when we produce v.
8 (12)

By applying p such Householder reections to Z we can transform it to an upper


triangular form
Hp H3 H 2 H1 Z = R
If we collect the Hk 's on the other side of the equality, we get the QR-factorization
Z = H1 H2 H3 H p R
|
{z
}
Q

Picking the rst p columns in Q as Q, and the rst p rows in R as R gives the
QR-factorization we use for least-squares problems.
Below is a prototype QR-factorization routine for Matlab, where the description
above is translated into code:
function [Q,R] = QRslow(A, zero)
%QRslow A perfect way to do QR-factorization, except that
%
the method is far from optimally implemented.
%
The calling sequence
%
[Q,R] = QRslow(A)
%
returns an orthogonal matrix Q and an upper triangular
%
matrix R of the same dimensions as A, such that A = Q*R.
%
The calling sequence
%
[Q,R] = QRslow(A,0)
%
returns the economy size QR-factorization, where Q has
%
orthonormal columns and the same dimensions as A, and R
%
is a square upper triangular matrix, such that A = Q*R.
if nargin == 2 && zero ~= 0
error('Use QRslow(A,0) for compact form');
end
[n,p] = size(A);
Q = eye(n);
for i = 1:p
v = zeros(n,1);
v(i:n) = A(i:n,i);
v(i) = v(i) + sign(v(i))*norm(v);
H = eye(n) - 2*(v*v')/(v'*v);
A = H*A;
Q = Q*H;
end
if nargin == 2
Q = Q(:,1:p);
R = triu(A(1:p,:));
else
R = triu(A);
end

9 (12)

SVD and tting

This is some notes on how to use the singular value decomposition (SVD) for solving some tting
problems. The author of these notes is Inge Sderkvist.
6.1

The Singular Value Decomposition

The singular value decomposition of a matrix A Rmn is


A = U S VT ,
where U Rmm and V Rnn are orthogonal matrices, and S Rmn is a
diagonal matrix containing the singular values 1 2 , r 0, r =
min(m, n).
6.2

The Rigid Body Movement Problem

Assume that we have n landmarks in a rigid body and let {x1 , . . . , xn } be the 3-D
positions of these landmarks before the movement and {y1 , . . . , yn } be the positions
after the movement. We want to determine a rotation matrix R and a translation
vector d that map the points xi to the points yi , i = 1, . . . , n. Because of measurement errors the mapping is not exact and we use the following least-squares problem
min

R,d

R xi + d yi 22 ,

(2)

i=1

where

= {R | RT R = R RT = I3 ; det(R) = 1},
is the set of orthogonal rotation matrices. Problem (2) is linear with respect to the
vector d but it is non-linear with respect to the rotation matrix
n R (that is because
n of
1
1
the orthogonality condition of R). Introducing x = n i=1 xi , y = n i=1 yi ,
and the matrices
A = [x 1 x , . . . , x n x ] , B = [ y 1 y , . . . , y n y ] ,
the problem of determining the rotation matrix becomes
min R A BF ,
R

(3)

2
where the Frobenius norm of a matrix Z is dened as Z2F =
i,j zi,j . In e.g.
(Arun, Huang & Blostein 1987, Hanson & Norris 1981) it is shown that this Orthogonal
Procrustes problem can be solved using the singular value decomposition of the matrix
C = B AT . An algorithm is presented below.

10 (12)

Algorithm 1 Algorithm for solving problem (2) by using the SVD-method.

1. x = n1 ni=1 xi , y = n1 ni=1 yi
2. A = [x1 x, . . . , xn x] , B = [y1 y, . . . , yn y]
3. C = B AT
4. U S VT = C ( Computation of the singular decomposition of C)
5. R = U diag(1, 1, det(U VT )) VT
6. d = y R x
The solution to problem (3) is unique if 2 (C) is nonzero. ( More strictly, the solution is not unique if 2 (C) equals zero or if 2 (C) = 3 (C) and det(U VT ) = 1.
But the latter situation implies that a reection is the best orthogonal matrix describing the movement. This situation will seldom occur when studying rigid bodies and
indicates that something is wrong.)
Algorithm 1 can be used also in 2-D settings when a movement in the plane is to
be determined.
More about the rigid body movement problem and its sensitivity with respect to
perturbations in the positions of the landmarks can be found in (Sderkvist 1993,
Sderkvist & Wedin 1993, Sderkvist & Wedin 1994).
6.3

Fitting Planes and Lines by Orthogonal Distance Regression

Assume that we want to nd the plane that are as close as possible to a set of n 3D points (p1 , . . . , pn ) and that the closeness is measured by the square sum of the
orthogonal distances between the plane and the points.
Let the position of the plane be represented by a point c belonging to the plane
and let the unit vector n be the normal to the plane determining its direction. The
orthogonal distance between a point pi and the plane is then (pi c)T n. Thus the
plane can be found by solving
n

min
((pi c)T n)2 .

c,n=1

Solving this for c gives c = 1/n


ducing the 3 n matrix

(4)

i=1

i=1

pi , see e.g (Gander & Hrebicek 1993). Intro-

A = [p 1 c , p 2 c , . . . , p n c ]
problem (4) can be formulated as
min AT n22 .

n=1

(5)

Using the singular decomposition A = U S VT , of A, we have

AT n22 = V ST UT n22 = ST UT n22 = (1 y1 )2 + (2 y2 )2 + (3 y3 )2 ,


minimized for y = (0, 0, 1)T
where y is the unit vector y = UT n. Thus, AT n22 is
or equivalently, for n = U(:, 3). The minimal value of ni=1 ((pi c)T n)2 is 32 . To
summarize we present the following algorithm for tting the plane.
11 (12)

Algorithm 2 Algorithm for tting a plane using orthogonal distance regression and singular value
decomposition.
1. c =

1
n

n
i=1

pi

2. A = [p1 c, . . . , pn c]
3. U S VT = A ( Computation of the singular decomposition of A)
4. n = U(:, 3), i.e., the normal is given as the third column of U.
Since the normal is given by U(:, 3) it follows from the orthogonality of U that the
plane is spanned by the two rst columns of U.
Similarly, we can obtain the best tted line as the rst column of U. The square
sum of distances between the best plane and the points are given by 33 and the
square sum of distances between the best line and the points is given by 22 + 32 .
A similar technique can be used also for tting a line in 2-D.

References
Arun, K. S., Huang, T. S. & Blostein, S. D. (1987), `Least-squares tting of two 3-d
point sets', IEEE Trans. Patt. Anal. Machine Intell. 9(5), 698--700.
Chapra, S. C. (2008), Applied Numerical Methods with MATLAB for Engineers and Scientists,
second edn, McGraw-Hill.
Demmel, J. W. (1997), Applied Numerical Linear Algebra, SIAM.
Gander, W. & Hrebicek, J. (1993), Solving Problems in Scientic Computing using Matlab
and Maple, Springer Verlag.
Hanson, R. J. & Norris, M. J. (1981), `Analysis of measurements based on the singular
value decomposition', SIAM J. Sci. Stat. Comput. 2, 363--373.
Higham, N. J. (1996), Accuracy and Stability of Numerical Algorithms, SIAM.
Lay, D. C. (2003), Linear Algebra and its Applications, third edn, Addison-Wesley.
Sderkvist, I. (1993), `Perturbation analysis of the orthogonal procrustes problem',
BIT. 33, 687--694.
Sderkvist, I. & Wedin, P.-. (1993), `Determining the movements of the skeleton
using well-congured markers', Journal of Biomechanics. 26(12), 1473--1477.
Sderkvist, I. & Wedin, P.-. (1994), `On condition numbers and algorithms for
determining a rigid body movement', BIT. 34, 424--436.

12 (12)

Anda mungkin juga menyukai