Anda di halaman 1dari 15

Matrix norms and error analysis

Steven L. Carnie
Here we summarize the important properties of the vector and matrix
norms we need for 620-381.
In doing error analysis for problems involving vectors and matrices (e.g.
solving linear systems), we need to measure the size of the errors in a vector
or matrix. There are two common kinds of error analysis in numerical linear
algebra: componentwise and normwise. In general, normwise error analysis
is easier (but less precise). For that we need to introduce vector norms and
matrix norms.

Vector norms

From 620122/142/211 you should have seen the idea of a vector norm. Here
we work with real vector spaces Rn but everything goes over to complex
spaces (replace transpose by Hermitian transpose, symmetric matrix by Hermitian, orthogonal matrix by unitary etc. where necessary).
Definition 1 The norm of a vector space V is a function k k : V 7 R with
the properties:
1. kxk 0 x V where kxk = 0 x = 0 (norms are positive for
nonzero vectors)
2. kxk = ||kxk (scaling a vector uniformly scales the norm by the same
amount)
3. kx + yk kxk + kxk x, y V (triangle inequality)
C Example The 3 most common vector norms are:
P
1. kxk1 = |xi | (the 1-norm)
1

2. kxk2 = (

x2i )1/2 =

3. kxk = maxi |xi | (the -norm)

which are all special cases of the p-norm:
X
kxkp = (
|xi |p )1/p
Exercise 1 Plot the set of all unit vectors in R2 in a) the 1-norm b) the
-norm.
The 2-norm is special because its value doesnt change under orthogonal
transformations.
p

p
kQxk2 = (Qx)T (Qx) = xT QT Qx = xT x = kxk2
since QT Q = I for an orthogonal matrix. Its also the only norm with a close
connection to the inner product (dot product) between vectors.
In finite-dimensional spaces, it doesnt matter much precisely which norm
you use, since they cant differ by more than a factor n from each other. This
property is called norm equivalence.
Theorem 1
kxk2 kxk1
kxk kxk2

nkxk2
nkxk

kxk kxk1 nkxk

So you may as well use whichever one is convenient.
Some sensible properties that some norms have are:
monotone
|x| |y| kxk kyk
where |x| |y| means a componentwise inequality i.e. |xi | |yi | i.
absolute
k|x|k = kxk
where |x| is the vector with components given by the absolute values
of the components of x.
2

It is not obvious, but true, that these two properties are equivalent i.e. a
norm that is monotone is absolute and vice versa.
Not every vector norm has these properties but the p-norms do.
We use vector norms to measure the size of errors in, for example, the
RHS b of a linear system.

Matrix norms

Similarly we will want to measure the size of errors in the coefficient matrix
A of a linear system. For that we need matrix norms. A matrix norm is
just a vector norm on the m n dimensional vector space of m n matrices.
Hence
Definition 2 A matrix norm is a function k k : Mmn 7 R with the properties:
1. kAk 0 A Mmn where kAk = 0 A = 0 (norms are positive for
nonzero matrices)
2. kAk = ||kAk (scaling a matrix uniformly scales the norm by the
same amount)
3. kA + Bk kAk + kBk A, B Mmn (triangle inequality)
C Example The following are matrix norms:
P
1. kAkF = ( a2ij )1/2 (the Frobenius norm)
2. kAkM = maxi,j |aij | (the max-norm)
Exercise 2 Are these norms monotone?
Hint: are they absolute?
Well mostly use not these norms, but a common class of matrix norms,
called subordinate matrix norms or induced matrix norms or (Demmel) operator norms:

kAk = max
x6=0

kAxk
kxk

for any vector norm. This is equivalent to:

kAk = max kAxk
kxk=1

In words, the (subordinate) norm of a matrix is the norm of the largest image
under the map A of a unit vector.
Exercise 3 Prove that any subordinate norm is a matrix norm
C Example The subordinate norms well use are those corresponding to
the vector norms listed above:
P
1. kAk1 = maxj i |aij | (the maximum column sum)
2. kAk2 (the 2-norm )
P
3. kAk = maxi j |aij | (the maximum row sum)
These are the only subordinate p-norms that are easy to compute.
P
Exercise 4 Prove that kAk = maxi j |aij |
The subordinate norms have some useful submultiplicative properties:
kAxk kAkkxk
by definition; and
kABk kAkkBk
Any matrix norm with the latter property is called consistent.
Similarly, we have norm equivalence for matrix norms, since they are
within a factor of n of each other:
Theorem 2

1
kAk2 kAk1 nkAk2
n

1
kAk kAk2 nkAk
n

1
kAk kAk1 nkAk
n
So you may as well use whichever one is convenient.
4

2.1

Theorem 3
kAk2 =

max (AT A)

Proof:
kAk2 = max
x6=0

kAxk2
(xT AT Ax)1/2
= max
x6=0
kxk2
kxk2

Since AT A is symmetric, it has an orthogonal diagonalisation (the spectral

decomposition) AT A = QQT where is a diagonal matrix containing the
eigenvalues of AT A, all real. Then
kAk2 = max
x6=0

(xT (QQT )x)1/2
= max
x6=0
kxk2
kQT xk2

since orthogonal transformations dont change the 2-norm of a vector.

Since AT A is positive definite, all the eigenvalues are positive.
s
sP
P 2
p
(y T y)1/2
y
i yi2
P 2 max P i2 = max
kAk2 = max
= max
y6=0
y6=0
kyk2
yi
yi
which can be attained by choosing y to be the jth column vector of the
identity matrix if, say, j is the largest eigenvalue. 
Well use the matrix 2-norm when we come to Least Squares fitting.

Sensitivity

Now we know how to measure size in our various vector spaces, we ask: how
do small changes in the coefficient matrix and RHS of a linear system affect
the solution? Since we can change a matrix in many different ways, we just
ask that the change is arbitrary but that the norm of the difference is small,
relative to the norm of the matrix i.e. we consider small normwise relative
changes to the matrix. This means we are doing a worst case analysis since
were looking for the worst that can happen.

We first ask: how do small (relative normwise) changes to A, b affect the

solution x? We have
(A + A)
x = b + b
= x + x, and
where x
Ax = b
Subtracting these gives
Ax = b A
x
so
x = A1 (b A
x)
Taking norms, using submultiplicative property and triangle inequality, gives
kxk kA1 k(kbk + kAkk
xk)

(1)

so the relative (normwise) forward error is

kA1 kkbk
kxk
kA1 k kAk +
xk
xk
k
k
which is a bit unusual in that we have divided by the norm of the computed solution, not the true solution. This can be expressed in terms of the
(normwise) relative errors of the inputs by multiplying top and bottom by
kAk:
kxk
kAk
kbk
kA1 k kAk(
+
)
k
xk
kAk
kAk k
xk
On the right hand side, we have the relative (normwise) error in A and
(almost) in b (since b = Ax, not A
x). On the left hand side, we have
(almost) the relative (normwise) error in x. Therefore the quantity
(A) kA1 k kAk

(2)

is the (normwise) relative condition number (for inversion). It measures how

sensitive the linear system is to changes in A, b.
If you dont like the almosts in the above derivation, its possible but a
bit more complicated to derive
(A)
kAk kbk
kxk
(

+
)
kAk
kxk
kbk
1 (A) kAk kAk
6

(3)

provided (A) kAk

< 1. This condition ensures that the matrix A + A is
kAk
nonsingular.
Proof: Suppose A + A is singular. Then (A + A)x = 0 for some
nonzero x. Then
Ax = Ax
x = A1 Ax
so
kxk kA1 k kAk kxk = (A)

kAk
kxk
kAk

therefore (A) kAk

1. Hence the stated condition ensures that A + A
kAk
is nonsingular. 
1 so that
For small enough kAk, we will have (A) kAk
kAk
kxk
kAk kbk
. (A)(
+
)
kxk
kAk
kbk

(4)

So far we havent specified the matrix norm were working in. If thats
important we specify it by a subscript e.g. 2 (A) if were using the matrix
2-norm.

3.1

Properties of (A)

1.
(A) 1
for any square matrix.
Proof: Since I = A1 A, for any consistent norm kIk = 1 kA1 k kAk =
(A). 
This means relative errors in the solution can never be smaller than
those of the inputs, unlike the situation in evaluating a function, with
relative condition number | xf 0 (x)/f (x) |.
We call a matrix with (A) = 1 perfectly-conditioned. If (A)  1, we
call the problem ill-conditioned.
Rule of Thumb: if (A) 10k , expect to lose k digits of accuracy in
the answer (compared to the error in the inputs)

2.
(A) =

max kAxk
min kAxk

for any subordinate norm.

Since kAk measures the largest dilation of a unit vector, the largest
dilation under A1 must be the reciprocal of the smallest dilation under
A. 
Special case:
s
2 (A) =

max (AT A)
min (AT A)

3. (A) measures how close the matrix A is to a singular matrix.

Proof: We showed above that (A) kAk
1 if A + A is singular i.e.
kAk
1
kAk

(A)
kAk
So if (A) is small (near 1), A cannot be close to a singular matrix.
Conversely if (A) is large, A could be close to a singular matrix. In
fact, in any p-norm one can prove that
kAkp
1
= min
p (A)
kAkp
i.e. if (A) is large, A is close to a singular matrix, 
Geometrically, a matrix being ill-conditioned corresponds to a system of
hyperplanes that are almost parallel i.e. you wouldnt have to tilt them
much to make them parallel and hence have no or infinitely many solutions.
In statistics, this is called collinearity.
(A) is computed in Matlab with the cond command. See help cond to
check which norms you can ask for. Its not cheap to compute (A) since
its more expensive to find A1 than to solve your system. So instead one
usually estimates its value. Typically you just want to know if its getting
too big. In Matlab one can use either condest to estimate 1 (A) or rcond
which estimates its reciprocal.

3.2

Students often think that since det(A) = 0 A is singular, a small deter1

minant is a signal that a matrix is nearly singular. Instead, its (A)
that is
a better measure. The determinant fails on 2 grounds:
it gives false positives.
Consider the matrix


A=

1001 1000
1
1

Its determinant is 1 but its condition number

106 and its clearly

0
0
close to singular (just add the matrix
which has norm
0.001 0
103 , which is small compared to the norm of A 103 ).
it gives false negatives.
Consider the matrix

A = 0.1I20

0.1 0 0
0 0.1 0
..
..
..
..
.
.
.
.
0 0 0.1

2020

Its determinant is 1020 but its condition number = 1 since max kAxk =
min kAxk = 0.1 i.e its perfectly-conditioned.
The condition number measures how close a linear system is to singularity, not the determinant.
If you try to solve a system in Matlab using \ and the estimated condition
number is too large, you get a warning :
Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate. RCOND = 9.144066e-19.
You then take your life in your hands ...

Backward error analysis

Now that we understand the sensitivity of linear systems, we need to determine how to ensure a small backward error i.e. how do we make a stable
algorithm?
9

4.1

The residual measures the backward error

The first question to ask is: how do I measure the backward error?
Recall that in root-finding the residual f (xn ) was a measure of the (absolute) backward error since xn was the exact root of the nearby function
f(x) f (x) f (xn ). A similar thing happens here.
we can compute the residual
From a computed solution x
r = A
xb
Then assuming b = 0, we have
(A + A)
x=b
A
x = r
krk kAk k
xk
kAk

krk
k
xk

so the relative normwise backward error satisfies:

kAk
krk

kAk
kAk k
xk
where we have defined the relative residual . Therefor a large relative residual guarantees a large (normwise) relative backward error. In fact, the bound
can be attained so a small relative residual guarantees a small backward error.
A stable method produces small relative residuals.

4.2

Error analysis

At this stage, we quote the result of bounding the backward error of common
algorithms using the standard model of floating point operations.
Notation: the quantity nu/(1 nu) crops up so often we give it a symbol.
n

nu
1 nu

Recall that in double precision u is so small that for any reasonable value of
n, n nu.
10

Theorem 4 If the n n triangular system

Tx = b
satisfies
is solved by substitution, the computed solution x
(T + T)
x=b
where
| T | n | T | nu | T |
i.e. it solves a system with a nearby matrix (and the correct b!).
Solving triangular systems by forward/backsubstitution is componentwise
backward stable
In general, componentwise stability is a more stringent requirement than
normwise stability, since for any monotone norm
| T | n | T | kTk n kTk
Solving triangular systems by forward/backsubstitution is normwise backward stable which is the usual usage of the term backward stable. So the
final stages of solving a linear system are no problem; any problems must
arise in the factorization stage.
Theorem 5 If the n n system
Ax = b
satisfies
is solved by Gauss elimination, the computed solution x
(A + A)
x=b
where
|| U
|
| A | 2n | L

(5)

which is not what were after. We would like a bound for A involving | A |.
|| U
| can be much bigger
The problem with Gauss elimination is that | L
than | A |.
U
had only positive components, there would be no
If for some reason L,
problem since then
|| U
|=| L
U
|=| A | +O(n )
|L
but in general we dont know that is true (and its not).
11

4.3

Gauss elimination with partial pivoting (GEPP)

|| U
|
The purpose of a pivoting strategy is to choose pivots that make | L
some modest multiple of | A |. Partial pivoting ensures that
| 1
|L
|?
componentwise, but what can we say about | U
Going to a normwise analysis, we get
kUk

kAk 2n kLk
for a monotone norm (such as the 1, F or infinity norms). Following Demmel,
let us define the growth factor using the max-norm:
M kUk
M /kAkM
= kLk
then, for the 1 or infinity norms we get
kAk 2n2 n kAk
where we have used
kAkM kAk1 nkAkM
kAkM kAk nkAkM
Hence the normwise backward error is determined by the growth factor.
For Gauss elimination without pivoting, there is no bound for the growth
factor.


1
C Example The matrix
for  1 has 2 which can be
1 1
very large.
For Gauss elimination with partial pivoting, the situation is rather complicated.
There are matrices that have = 2n1 for them GEPP is not backward stable.
These matrices appear to be very rare in practise. For most matrices,
is very modest in size e.g < 10
12

For random matrices e.g. matrices with components drawn from a

standard normal distribution, n1/2
These results lead experts to say that, for practical purposes, GEPP is (normwise) backward stable. Thats why its the default direct method for solving
dense linear systems.
SUMMARY:
1. Gauss elimination without pivoting is not backward stable.
2. GEPP is, for practical purposes, (normwise) backward stable.
3. A backward stable method produces small (relative) residuals.
4. This does not imply small (forward) errors if (A) >> 1 i.e
your problem is ill-conditioned.

Beyond GEPP

What if youre a nervous sort of person, who wants more reassurance about
backward stability than GEPP gives you? There are 3 options open to you:
1. Gauss elimination with complete pivoting
2. iterative refinement
3. QR factorization

5.1

Gauss elimination with complete pivoting

Instead of just looking for the largest (in magnitude) element in the pivot
column, look for the largest element in the remaining submatrix to the right
and below the pivot. Then do row and column swaps to bring that element
into the pivot position. This is called complete pivoting and costs about n3
operations i.e. its expensive compared to partial pivoting.
Then we have the following result:
n1/2+log(n)/4

13

A 1968 conjecture that n was disproved in 1991 but its still thought
that
n
for complete pivoting i.e GE with complete pivoting is backward stable.

5.2

Iterative refinement

A cheaper alternative is to do a single step of iterative refinement in the same

precision. This means: since Ax = b and A
x = b r then subtracting we
get
) = r
A(x x
which implies if we solved the system Ad = r then we could get the exact
+ d! This would be true in exact arithmetic, but we only get
solution x = x
a computed solution to Ad = r. So instead we regard the updated solution
+ d as an improved solution.
y=x
We could then keep on iterating this process, but for matrices that are
not too ill-conditioned or badly scaled, it has been proved that one step of
iterative refinement is enough to produce small componentwise residuals:
| Ay b | n+1 | A | | y |
which is about as small as one could hope for! This step is cheap since we
can re-use the LU factors of A when solving for d.

5.3

QR factorization

Finally, you could instead use an alternative matrix factorization, the QR

factorization, that well meet when we treat least squares problems. This
has great (normwise) backward stability:
Theorem 6 If the n n system
Ax = b
satisfies
is solved by QR factorization, the computed solution x
(A + A)
x=b
14

where
kAkF n2 cn kAkF

(6)

where cn means cnu/(1 cnu) for some small integer c.

The componentwise backward stability is not as good, but this can be
improved by 1 step of iterative refinement, as above.
QR factorization is about twice as expensive as LU factorization but its
backward stability behaviour is impeccable.

References
1. J.W.Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
2. G.Golub and C. Van Loan, Matrix Computations, Johns Hopkins Press,
1989.
3. N.J.Higham, Accuracy and Stability in Numerical Algorithms, SIAM,
1996.

15