Contents
1
Introduction
Introduction
While Chapra (2008) makes a good job in general, the book lacks in the linear algebra
part. Doing LU-factorization without a permutation matrix is a bad idea. Cholesky
factorization of a symmetric matrix is only possible if the matrix is positive denite.
Those particular problems are handled in the slides of C0002M. When it comes
to least squares though, I would like the course to go deeper than the book with a
slightly different focus. This document serves as a complement in that respect and
is part of the course literature.
1 (12)
12
10
8
y
6
Figure 1: The data points and the model function y = a1 + a2 x + a3 /(x + 1) for different
choices of a1 , a2 and a3 .
2 (12)
data points into the model function and receive a system of linear equations:
1 0
1 = a1 + a2 0 + a3 /(0 + 1)
3
=
a
+
a
0.5
+
a
/(0.5
+
1)
1
2
3
1 0.5
3 = a1 + a2 1 + a3 /(1 + 1)
1 1
1 2
5
=
a
+
a
2
+
a
/(2
+
1)
1
2
3
11 = a1 + a2 5 + a3 /(5 + 1)
1 5
1
1
1/1.5
a
1
3
1/2 a2 = 3
5
1/3 a3
11
1/6
If we could solve this system, we would get a function curve that passed all the data
points. It is however not possible to solve the system, so we do the second best, we
nd the least squares solution to minimize the difference between the left hand and
the right hand side. The following command sequence in Matlab nds our solution
and makes a plot of it as shown in Figure 2.
>> x = [0 0.5 1 2 5]'
x =
0
0.5000
1.0000
2.0000
5.0000
>> y = [1 3 3 5 11]'
y =
1
3
3
5
11
>> Z = [x.^0 x 1./(x+1)]
Z =
1.0000
0
1.0000
1.0000
0.5000
0.6667
1.0000
1.0000
0.5000
1.0000
2.0000
0.3333
1.0000
5.0000
0.1667
>> a = Z\y
a =
1.5772
1.8792
-0.3221
>> xs = -0.2:0.01:5.2;
>> plot(xs,a(1)+a(2)*xs+a(3)./(xs+1),x,y,'o')
>> xlabel('x'), ylabel('y')
As seen in Figure 2 the least squares solution ts nicely to the general behavior of the
data points, without passing through them. Also note that the least squares solution
is found easily with \ in Matlab.
The success of all this is due to that the unknowns a1 , a2 and a3 are weights in a
linear combination of basis functions. This is also true for tting polynomials for
example. With this in mind we can express a general model
(1)
12
10
0
1
with known basis functions k (x). In the previous example we had 1 (x) = 1,
2 (x) = x and 3 (x) = 1/(x + 1). Inserting the data points (xi , yi ) into the model
generates the system
1 (x1 ) 2 (x1 ) . . . p (x1 )
a1
y1
1 (x2 ) 2 (x2 ) . . . p (x2 ) a2 y2
..
.. = ..
..
..
.
. .
.
.
1 (xn ) 2 (xn ) . . . p (xn )
ap
yn
which is solved in least square sense to nd the weights ak . If the model function does
not have the form (1) it may still be possible to transform it to the right shape. This
is the case for the model functions y = a1 ea2 t and y = a1 ta2 , where the form (1) is
realized by taking the logarithm of the expressions.
In other cases, for example with y = a1 sin(a2 t) + a3 , there is no cure, and more
computationally demanding methods are necessary to nd the least squares solution.
There is a Matlab command lsqcurvet to handle those non-linear cases.
There are several methods for nding the least squares solution. The one appearing
most frequently in Linear algebra textbooks are the normal equations.
Theorem 1 (The Normal equations) Given the matrix Z Rnp and the right hand
side y Rn , the solution set of the least squares problem
min y Z a2
aRp
2.6667
2.3333
1.8333
1.1926
-0.5535
0.3237
The restriction with full column rank makes sense from a numerical perspective,
since the numerical algorithms we use depend on the existence of a unique solution.
It is however possible to handle rank-decient problems gracefully using the singular
value decomposition (SVD), though we use it for other purposes in these notes.
One can show that the condition number
/
Z x2
Z x 2
min
Cond(Z) = max
x=0 x2
x=0 x2
describes the relative perturbation sensitivity in the least squares solution a, when
e = y Z a, the residual, is small. Moreover
(
)2
Cond(ZT Z) = Cond(Z)
5 (12)
so, the normal eqations have a relative sensitivity that is squared compared to the
original problem formulation. From this we draw the conclusion that any numerical
method using the normal equations will be unstable, since the rounding errors will
(
)2
correspond to Cond(Z) , while the mathematical problem has condition number Cond(Z). This bad numerical behaviour for the normal equations indeed does
happen when we use the Cholesky factorization. We need better options!
aRp
0.7104
-0.1043
-0.4041
-0.4888
0.2868
-1.1926
-0.5535
0.3237
Obviously this works as stated in the theorem. We get the same solution as before.
Note that, to get the QR-factorization described in this document, that we are interested in for solving least squares problems, we have to add an extra 0 argument to
qr(Z,0).
In order to prove Theorem 3, we rst need to establish a property of the l2 -norm.
Lemma 1 Multiplying a vector with an orthogonal matrix does not change its l2 -norm. Thus if
Q is orthogonal it holds that
Q x2 = x2
Proof Q x22 = (Q x)T Q x = xT QT Q x = xT x = x22
Using this lemma we prove Theorem 3
7 (12)
[ T ]
2
[ T
]
Q (y Q R a)
2
Q
=
Q
b T (y Q R a)
b T (y Q R a)
=
Q
2
[
[ T
]
]
Q y QT Q R a
2
QT y R a
2
=
bT
bTy
bTQRa
=
Q
Q yQ
2
2
b T y22 Q
b T y22
= QT y R a22 + Q
b T y22 , and this minimum is realized when
So the minimum value of y Z a22 is Q
QT y R a22 = 0, i.e. when QT y R a = 0.
This section gives a rough sketch on how to calculate the QR-factorization in a way
that is numerically stable.
The Householder reection matrix
H=I2
v vT
,
vT v
H Rnn
z11 z1 2
z21
z31
v=
..
.
zn1
the Householder reection H, applied on Z will zero out the rst column, except the
rst entry. The rst column of H Z will be
z1 2
0
..
.
0
Just as in Gauss elimination, we have translated everything below the diagonal to
zero. We continue in the same way with the second, third etc column, but regard
only the entries on the diagonal and below when we produce v.
8 (12)
Picking the rst p columns in Q as Q, and the rst p rows in R as R gives the
QR-factorization we use for least-squares problems.
Below is a prototype QR-factorization routine for Matlab, where the description
above is translated into code:
function [Q,R] = QRslow(A, zero)
%QRslow A perfect way to do QR-factorization, except that
%
the method is far from optimally implemented.
%
The calling sequence
%
[Q,R] = QRslow(A)
%
returns an orthogonal matrix Q and an upper triangular
%
matrix R of the same dimensions as A, such that A = Q*R.
%
The calling sequence
%
[Q,R] = QRslow(A,0)
%
returns the economy size QR-factorization, where Q has
%
orthonormal columns and the same dimensions as A, and R
%
is a square upper triangular matrix, such that A = Q*R.
if nargin == 2 && zero ~= 0
error('Use QRslow(A,0) for compact form');
end
[n,p] = size(A);
Q = eye(n);
for i = 1:p
v = zeros(n,1);
v(i:n) = A(i:n,i);
v(i) = v(i) + sign(v(i))*norm(v);
H = eye(n) - 2*(v*v')/(v'*v);
A = H*A;
Q = Q*H;
end
if nargin == 2
Q = Q(:,1:p);
R = triu(A(1:p,:));
else
R = triu(A);
end
9 (12)
This is some notes on how to use the singular value decomposition (SVD) for solving some tting
problems. The author of these notes is Inge Sderkvist.
6.1
Assume that we have n landmarks in a rigid body and let {x1 , . . . , xn } be the 3-D
positions of these landmarks before the movement and {y1 , . . . , yn } be the positions
after the movement. We want to determine a rotation matrix R and a translation
vector d that map the points xi to the points yi , i = 1, . . . , n. Because of measurement errors the mapping is not exact and we use the following least-squares problem
min
R,d
R xi + d yi 22 ,
(2)
i=1
where
= {R | RT R = R RT = I3 ; det(R) = 1},
is the set of orthogonal rotation matrices. Problem (2) is linear with respect to the
vector d but it is non-linear with respect to the rotation matrix
n R (that is because
n of
1
1
the orthogonality condition of R). Introducing x = n i=1 xi , y = n i=1 yi ,
and the matrices
A = [x 1 x , . . . , x n x ] , B = [ y 1 y , . . . , y n y ] ,
the problem of determining the rotation matrix becomes
min R A BF ,
R
(3)
2
where the Frobenius norm of a matrix Z is dened as Z2F =
i,j zi,j . In e.g.
(Arun, Huang & Blostein 1987, Hanson & Norris 1981) it is shown that this Orthogonal
Procrustes problem can be solved using the singular value decomposition of the matrix
C = B AT . An algorithm is presented below.
10 (12)
1. x = n1 ni=1 xi , y = n1 ni=1 yi
2. A = [x1 x, . . . , xn x] , B = [y1 y, . . . , yn y]
3. C = B AT
4. U S VT = C ( Computation of the singular decomposition of C)
5. R = U diag(1, 1, det(U VT )) VT
6. d = y R x
The solution to problem (3) is unique if 2 (C) is nonzero. ( More strictly, the solution is not unique if 2 (C) equals zero or if 2 (C) = 3 (C) and det(U VT ) = 1.
But the latter situation implies that a reection is the best orthogonal matrix describing the movement. This situation will seldom occur when studying rigid bodies and
indicates that something is wrong.)
Algorithm 1 can be used also in 2-D settings when a movement in the plane is to
be determined.
More about the rigid body movement problem and its sensitivity with respect to
perturbations in the positions of the landmarks can be found in (Sderkvist 1993,
Sderkvist & Wedin 1993, Sderkvist & Wedin 1994).
6.3
Assume that we want to nd the plane that are as close as possible to a set of n 3D points (p1 , . . . , pn ) and that the closeness is measured by the square sum of the
orthogonal distances between the plane and the points.
Let the position of the plane be represented by a point c belonging to the plane
and let the unit vector n be the normal to the plane determining its direction. The
orthogonal distance between a point pi and the plane is then (pi c)T n. Thus the
plane can be found by solving
n
min
((pi c)T n)2 .
c,n=1
(4)
i=1
i=1
A = [p 1 c , p 2 c , . . . , p n c ]
problem (4) can be formulated as
min AT n22 .
n=1
(5)
Algorithm 2 Algorithm for tting a plane using orthogonal distance regression and singular value
decomposition.
1. c =
1
n
n
i=1
pi
2. A = [p1 c, . . . , pn c]
3. U S VT = A ( Computation of the singular decomposition of A)
4. n = U(:, 3), i.e., the normal is given as the third column of U.
Since the normal is given by U(:, 3) it follows from the orthogonality of U that the
plane is spanned by the two rst columns of U.
Similarly, we can obtain the best tted line as the rst column of U. The square
sum of distances between the best plane and the points are given by 33 and the
square sum of distances between the best line and the points is given by 22 + 32 .
A similar technique can be used also for tting a line in 2-D.
References
Arun, K. S., Huang, T. S. & Blostein, S. D. (1987), `Least-squares tting of two 3-d
point sets', IEEE Trans. Patt. Anal. Machine Intell. 9(5), 698--700.
Chapra, S. C. (2008), Applied Numerical Methods with MATLAB for Engineers and Scientists,
second edn, McGraw-Hill.
Demmel, J. W. (1997), Applied Numerical Linear Algebra, SIAM.
Gander, W. & Hrebicek, J. (1993), Solving Problems in Scientic Computing using Matlab
and Maple, Springer Verlag.
Hanson, R. J. & Norris, M. J. (1981), `Analysis of measurements based on the singular
value decomposition', SIAM J. Sci. Stat. Comput. 2, 363--373.
Higham, N. J. (1996), Accuracy and Stability of Numerical Algorithms, SIAM.
Lay, D. C. (2003), Linear Algebra and its Applications, third edn, Addison-Wesley.
Sderkvist, I. (1993), `Perturbation analysis of the orthogonal procrustes problem',
BIT. 33, 687--694.
Sderkvist, I. & Wedin, P.-. (1993), `Determining the movements of the skeleton
using well-congured markers', Journal of Biomechanics. 26(12), 1473--1477.
Sderkvist, I. & Wedin, P.-. (1994), `On condition numbers and algorithms for
determining a rigid body movement', BIT. 34, 424--436.
12 (12)