1 Set Theory 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Sets of numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Sets of points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Subsets: Inclusion and Equality of Sets . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Special Sets: Empty Set, Universal Set, Power Set, and Intervals . . . . . . . 8
1.5 Graphical Representation: Venn Diagrams . . . . . . . . . . . . . . . . . . . . 9
1.6 Set Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1 Operations on sets: Definitions . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1.1 Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1.2 Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1.3 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1.4 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1.5 Symmetric difference . . . . . . . . . . . . . . . . . . . . . . 12
1.6.2 Operations on sets: Properties . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2.1 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2.2 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2.3 Distributivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.2.4 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2.5 Union identity set . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2.6 Intersection identity set . . . . . . . . . . . . . . . . . . . . . 15
1.7 Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8 Cartesian Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.9 Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.10.1 Cardinality: Countable sets . . . . . . . . . . . . . . . . . . . . . . . . 34
1.10.2 Cardinality: Uncountable sets . . . . . . . . . . . . . . . . . . . . . . . 38
1.10.2.1 Cantor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 38
1.10.2.2 Cardinality of the reals . . . . . . . . . . . . . . . . . . . . . 43
1.10.3 Arithmetic of the cardinal numbers . . . . . . . . . . . . . . . . . . . 46
1.10.3.1 Ordering and equality . . . . . . . . . . . . . . . . . . . . . . 47
CONTENTS i
1.10.3.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.10.3.3 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.10.3.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.10.3.5 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.3.6 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.3.7 Operating with cardinals . . . . . . . . . . . . . . . . . . . . 53
1.11 Historical Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.12 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.A Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.A.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.A.2 Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1 Introduction
4 and 5 studied linear first and second order ordinary differential and
C
HAPTERS
difference equations (ODEs) with constant coefficients. The differential equations
involved derivatives of continuous time functions. The difference equations in-
volved delayed discrete time sequences. Both types of ODEs involve memory elements
(differentiators and delay elements); i.e., their solution at the current time depends on
current and past values of the input or forcing term and past values of the indeterminate.
This Chapter starts our journey in the study of systems of linear algebraic equations, or
simply a linear system of equations. Our goal is to build tools to solve simultaneous
equations like:
In this system, there are unknown quantities, namely, the variables x1 , x2 , and x3 , and
known quantities like the numbers 6, 20, and 10 on the right hand side of the equations.
In this example, the number of equations is the same as the number of unknowns.
To write explicitly these three equations in three unknowns is not very onerous. If all we
had to do in life was solving such few linear algebraic equations in such few unknowns,
writing them as above and solving them by traditional methods would suffice. But, if we
have instead to solve a system of one thousand equations in one thousand unknowns, just
to write them would be tedious, let alone solving them by traditional methods. With com-
puters, we can solve linear systems with many more, thousands, hundreds of thousands,
if not millions of linear algebraic equations. It is possible to represent these equations in
a much more compact form by using vectors and matrices. Once introduced, we operate
directly with vectors and matrices. The language of vectors and matrices simplifies the
282 Vectors and Vector Calculus
writing of these large systems of equations and very importantly helps focusing on the
underlying concepts, without being overwhelmed with the details of so many equations.
The word algebraic means that there are no memory elements, no differentiators (really,
integrators) nor delays. These linear systems of algebraic equations appear, even if under
disguise, in most practical applications, and it is never too much to emphasize their rel-
evance in science and engineering. We actually already saw that, when solving first and
second order ODEs in Chapters 4 and 5, we had to solve linear systems of algebraic equa-
tions when imposing the initial conditions. Granted that, with first order ODEs, these
linear systems of algebraic equations were trivially reduced to a single linear equation in
a single unknown. With second order ODEs, we did have to solve two linear algebraic
equations in two unknowns, which better justifies the use of the word system when refer-
ring to these equations.
Linear systems of algebraic equations serve for us also as the underlying thread to study-
ing many other important concepts in Linear Algebra – vectors, matrices, eigenvalues
and eigenvectors, vector spaces, among others.
This Chapter sets the basic machinery for studying systems of linear algebraic equations.
The Chapter introduces vectors and matrices. Section 6.2 focus on vectors, while Sec-
tion 6.3 introduces calculus with vectors. Section 6.4 introduces matrices and Section 6.5
calculus with matrices and special types of matrices. Section 6.6 introduces important
functions with matrices like the determinant, trace, and revisit the concept of inverse of a
matrix introduced in Section 6.4. Finally, section 6.7 lists a few illustrative problems.
Vectors can be column vectors and row vectors as we consider next in Subsections 6.2.1
and 6.2.2.
2
−3
π
2
Usually, we bracket such a collection of numbers and prefer the notation
2
−3 . (6.2.1)
π
2
These (6.2.1) or (6.2.2) represent a vector. We adopt square brackets as in (6.2.1), although,
on occasion, we will also use (6.2.2). Vectors are commonly represented by boldface lower
case letters or symbols like v:
2
v = −3 . (6.2.3)
π
2
→
Other common notations are v or v , i.e., we underline the letter or we place an arrow on
top of the symbol representing the vector. In this case, we do not boldface the symbol.
The elements 2, −3, and π2 are the entries, elements, coordinates, or components
of the vector. They are ordered from top to bottom, so 2 is the first entry or
entry 1, while π2 is entry 3.
The entries of the column vector v in (6.2.1) can be thought of as the coordinates
of a point in a three dimensional Cartesian or Euclidean space R3 , and the
vector v becomes a point in R3 . With this interpretation of v, which has 3
entries, as an element of R3 (or of C3 , for that matter), we refer to the vector v
in (6.2.1), as a 3-dimensional or 3D vector.
This vector now has four entries, so it is a 4D-vector. In this vector, the entries
are complex numbers, so, we have
1+j
4 4
ej π3 ∈ C .
b= (6.2.5)
22
represented by
a1
a2
a = .. . (6.2.7)
.
an
There are several important points to note in (6.2.7). First, the entries in (6.2.7)
are labeled by a symbol that is the unbolded version of the boldface letter rep-
resenting the vector; for example, entry 2 of the vector a is represented by a2 .
Second, the dimension n is a positive integer but left unspecified; we can read
it from the vector as the subindex n of the last entry an or by counting the
number of entries of the vector a. These are common and intuitive notations
that we will usually follow. If the entries are real valued unknown constants,
the vector a is a point in the Cartesian or Euclidean space Rn . We note this by
stating that a ∈ Rn and the vector a is n-dimensional or is a nD-vector. Fig-
ure 6.1 illustrates a 2D-vector graphically as a point on the plane, where the
first coordinate v1 is placed in the horizontal axis and v2 in the vertical axis.
v
v= 1
v2
In this five dimensional vector, at time t, the entries are the values taken by the
functions in each entry of the vector evaluated at that (instantaneous) time t.
The range where the vector of functions is defined is indicated in (6.2.8) to
be R+ .
The functions may be real valued, complex valued, or may take values in some
other set. Note that we use the notation where the symbol indicating the vec-
tor is now also a function, with the indexing variable or independent variable t
taking values in some set; in (6.2.8), the independent variable t takes values in
the positive reals.
We consider an example.
cos 2πt
v(t) = , t ∈ [0 1].
sin 2πt
" 1 #
1 √
v = √12 .
4 2
Remark 6.2.1 (Notation: Vector of functions). There is a certain level of ambiguity in the
notation. By f (t) we mean the vector f (t) at a particular (fixed value) t. However, when we
let t vary in its domain, say, t ∈ R+ , then f (t) in (6.2.8) really stands for a collection or
family of vectors each indexed by its value of t. We say the vector is a vector of functions, or
a function vector. Sometimes, to be more precise, we indicate this as (ft )t∈R+ or simply (ft ), if
the range is understood from the context.
x4 [k]
k 2 + 2k + 3
1 k
x[k] =
3 , k ∈ N.
2
cos 2π16
k
while at k = 1, we get
5
1
x[1] = 3 .
2
cos π8
288 Vectors and Vector Calculus
These examples provide quite a variety of vectors; the entries of a vector may be numbers,
unknown constants, variables, functions, or sequences; the entries may be integer valued,
real valued, or complex valued.
Unless specifically stated, or it is understood otherwise from the context, vectors are com-
monly taken as column vectors.
All the comments made for column vectors can be repeated for row vectors. In particular,
the entries of the row vector (6.2.10) can be thought of as the coordinates of a point in the
Cartesian or Euclidean space R3 . The points in R3 are then (row) vectors.
A row vector of dimension n has n entries arranged horizontally. We say that the row
vector has one row and n columns.
The reasons for this will become clearer when we study matrices.
6.3.1 Equality
Two vectors are equal if their entries are equal. For example, equality among two vectors
as in
x1 1
x2 0
=
x3 α
x4 1+j
It is clear that equality of vectors being pointwise or entrywise implies that the two vec-
tors have to be of the same dimension. In the example above, both vectors are 4D-vectors.
It is interesting to note that while other entries, except the third, in the left vector are
known constants (because the corresponding entries on the right vector specify a real val-
ued number), entry 3 in the vector on the left is a variable since the corresponding entry
in the vector on the right is an unknown variable α.
If the entries of the two vectors are functions, then equality of the two vectors implies
that the functions of each entry are identical, i.e., equal for all values of the independent
variable where the functions are defined. For example,
x1 (t) f (t)
x2 (t) = g(t) , t ∈ T,
x3 (t) h(t)
means that the entries of the 3D-vectors are identical as functions of time, x1 (t) ≡ f (t),
x2 (t) ≡ g(t), and x3 (t) ≡ h(t), i.e., they are equal ∀t ∈ T .
Remark 6.3.1 (Equality of row vectors). We have presented the equality of column vectors.
The equality of row vectors is defined like equality of column vectors as entrywise equality.
We can only consider equality of row vectors that have the same dimension.
then their sum h(t) is the vector whose entries are the entrywise sum of the
corresponding entries of each of the vectors. Let the vectors f (t) and g(t) be
nD-vectors. Their sum is:
f1 (t) g1 (t)
f2 (t) g2 (t)
f (t) + g(t) = .. + ..
. .
fn (t) gn (t)
f1 (t) + g1 (t)
f2 (t) + g2 (t)
= .
..
.
fn (t) + gn (t)
We can similarly define the sum of vectors of sequences:
f1 [k] g1 [k]
f2 [k] g2 [k]
f [k] + g[k] = .. + ..
. .
fn [k] gn [k]
f1 [k] + g1 [k]
f2 [k] + g2 [k]
= .
..
.
fn [k] + gn [k]
Remark 6.3.2 (Addition of row vectors). We have presented the sum of column vectors.
The sum of row vectors is defined likewise to the sum of column vectors as entrywise sum.
We can only consider the addition of row vectors that have the same dimension.
v + u = u + v.
Zero vector: Unit of addition There is a vector, the zero vector 0, that is the unit element
of the addition of vectors. Let the nD-vector:
0
..
0 = . .
0
0 + v = v + 0 = v,
Inverse with respect to addition and subtraction For every vector v, there is a vector u
such that the sum with the original matrix v is the zero vector. The vector u is the
negative of v, so that:
v + (−v) = 0.
0
6.3 Calculus with Vectors 293
since factoring out the scalar, the number 2 in the above example, is the same
as the product of the vector by the two scalars 2 and 21 , and then bringing the
scalar 21 inside the vector.
αβv = (αβ)v.
αv = vα.
Distributivity The product of a scalar by a sum of vectors distributes, and the product of
a sum of scalars by a vector also distributes:
2π
∗ 2π
−1 + ej (k N ) −1 + e−j (k N )
−63 −63
=
.. ..
. .
2 + 45j 2 − 45j
The vectors are of dimension n because we stated so before the example. The reader
should get used to these liberties and subtleties with the notation.
We now consider vector transposition. The transpose of a column vector is a row vector
with the same dimension and the same entries as the column vector, where the first entry
(left most entry) of the row vector is the first entry of the column vector, the second entry
of the row vector is the second entry of the column vector and this continues, till the last
entry of the row vector is the last entry of the column vector.
The transpose of a row vector is the column vector whose entries are the same entries of
the original row vector now placed from top to bottom by the same order of the order of
the entries of the row vector, as we scan it from left to right.
Transposition is indicated by superindexing the vector with the letter T . For example, we
transpose the nD-column vector to obtain a nD-row vector:
T
2
−5
.. = [2 − 5 · · · 31]
.
31
6.3 Calculus with Vectors 295
This example transposes a row vector of dimension m to obtain a column vector of the
same dimension m.
vH = (v∗ )T
∗
= vT .
This says that the order by which conjugation and transposition are taken is immaterial.
vH = vT .
vH = v∗;
Similarly, the Hermitian of a row vector is a column vector. For example, for a sequence
`D-vector, i.e., a vector of ` sequences:
x1 [k]∗
x2 [k]∗
[x1 [k] x2 [k] · · · x` [k]]H = .. .
.
x` [k]∗
We conclude that the Hermitian of a column vector of dimension ` is a row vector of di-
mension `; and, similarly, the Hermitian of a row vector of dimension ` is a column vector
of dimension `.
v = α1 v1 + · · · + αn vn , (6.3.1)
where α1 , · · · , αn are scalars, e.g., integers, real valued, or complex valued. These coeffi-
cients can be zero, in which case the l.c. is zero. The number of vectors can also be zero,
in which case, again, the vector v is zero.
It is straightforward to verify that the linear combination of vectors is in fact linear, since the
linear combination of two vectors, where each vector is given by a linear combination of
vectors {v1 , · · · , vn } and {w1 , · · · , wm }, respectively, is itself a linear combination of the
vectors {v1 , · · · , vn , w1 , · · · , wm }. Consider the two vectors:
v = α1 v1 + · · · + αn vn (6.3.2)
w = β1 w1 + · · · + βm wm , (6.3.3)
γ1 v + γ2 w = γ1 (α1 v1 + · · · + αn vn ) + γ2 (β1 w1 + · · · + βm wm )
= γ1 α1 v1 + · · · + γ1 αn vn + γ2 β1 w1 + · · · + γ2 βm wm .
We go from the first equality to the second equality by distributing the product of the
scalars γ1 and γ2 with respect to the sum of vectors in parenthesis. The resulting vector is
clearly a linear combination of the vectors {v1 , · · · , vn , w1 , · · · , wm }.
n
X
αi = 1.
i=1
This illustrates two facts. That we can express a more complicated vector like
31
−j
42
v=
52
π
−ej 3
6.3.6.1 Limit
Consider the nD-vector (of functions) x(t), t ∈ T ⊂ R. Let t0 ∈ T . The limit of the vector
of functions is given by
x1 (t)
x2 (t)
lim x(t) = lim ..
t→t0 t→t0 .
xn (t)
limt→t0 x1 (t)
limt→t x2 (t)
0
= .
..
.
limt→t0 xn (t)
The limit of the vector of functions is the vector of the limits of the functions in each entry
of the vector. To prove this result, we need to introduce concepts that we do not have yet
available like distance between vectors that formalize the notion of vectors being close
to each other. Instead, we take a pragmatic view and simply take this as the definition
of limit with vectors. On the other hand, this is intuitive, since we can expect that, if
the limit of the sequence of vectors is defined entrywise, as the entries of the vectors in
the sequence get closer and closer, we expect the vectors themselves (say, as points in a
Cartesian space) to get closer and closer.
2t2 e−t
lim x(t) = lim sin(2πt) t−1
t→1
ht→1 i
= lim sin(2πt) lim(t − 1) lim 2t2 lim e−t
t→1 t→1 t→1 t→1
e−1
= 0 0 2
6.3 Calculus with Vectors 299
Rt
f
x1 (t) dt
Rttif
x2 (t) dt
ti
= .. .
.
R
tf
ti
xn (t) dt
The limits of the summation in (6.3.4) can be arbitrary and define the range over which the
terms are summed; the example sums the vectors vn from n = 0 to some arbitrary value N .
We now consider Taylor series of a vector of functions. The concept is best illustrated by
an example.
∞
X ρn
eρ = , ∀ρ ∈ C.
n=0
n!
1
The Taylor series of 1−ρ is valid for |ρ| < 1, while the Taylor series of eρ is
valid for any value of ρ in the complex plane since eρ is an entire function, see
Definition 3.5.3 in Chapter 3. When we consider the two functions as entries
of the same vector v, we restrict the domain of validity to the intersection of
the domains in which the Taylor series of each entry is valid. For the above
example, the two domains are |ρ| < 1 and C. This intersection is
{ρ ∈ C : |ρ| < 1} ∩ C = {ρ ∈ C : |ρ| < 1},
302 Vectors and Vector Calculus
abbreviated by |ρ| < 1. So, the Taylor series of the vector v is given by:
P∞
ρn
n=0
v(ρ) = P , |ρ| < 1
∞ ρn
n=0n n!
∞
X ρ
= , |ρ| < 1
ρn
n=0
n!
∞
X
= vn (ρ), |ρ| < 1.
n=0
Remark 6.3.4 (Product of vectors). Note that we did not defined the product of two vectors.
This is very different from what we are used to with the other operations we considered in this
Section. We will take it when we address the product of matrices in the next Section.
6.4 Matrices
In Section 6.2, we introduced vectors. We now study matrices, building on what we
learned about vectors. We start by motivating our study of matrices by revisiting the
system of three linear algebraic equations in three unknowns given at the beginning of
the Chapter by (6.1.1). We repeat it herein for easy reference.
As we mentioned in the introduction to the Chapter a group of equations like these is re-
ferred to as a linear systems of algebraic equations, or simply a linear system of equations
where the unknown quantities are the variables x1 , x2 , and x3 .
To write this system of three equations compactly, we organize the coefficients of the
unknown variables x1 , x2 , x3 in an array or rectangle form as follows:
4 6 9
6 −2
5 −8 1
6.4 Matrices 303
The missing entry in this array can be taken to be 0, i.e., in the second equation we can
introduce
0 · x2
4 6 9
6 0 −2
5 −8 1
To make sure no entries are lost, we customarily use square brackets to delineate the
rectangular form as follows
4 6 9
6 0 −2 (6.4.1)
5 −8 1
We will usually stick with square brackets. The array enclosed by square brackets in
Equation (6.4.1) is an example of a matrix. Matrices will be represented by capital boldface
letters:
4 6 9
A = 6 0 −2 . (6.4.2)
5 −8 1
Returning to the system of three equations in three unknowns, the three unknowns and
the three known terms are collected also in vectors
x1 6
x = x2 and b = 20.
x3 10
The compact notation for the system of three equations in three unknowns is then:
Ax = b.
To make sense of this compact notation, we need to understand what A and Ax stand for.
The equality is the equality of vectors, since the Right-Hand-Side is a vector. This Section
considers the matrix A.
304 Vectors and Vector Calculus
We identify important quantities related to matrices from this matrix (6.4.2). We work
with the matrix A in (6.4.2). The nine elements arranged neatly in the matrix A are the
elements or entries of A. If we scan the entries of A in lexicographic order, starting at
the top left corner, and from left to right, the first three elements 4, 6, and 9 make up
the first horizontal line and are the first row of the matrix A. The three elements below 6,
0, −2 are the second row, and, finally, the three last elements 5, −8, and 1 are the third row.
We can scan the matrix A in different order, going down from the top left element, we
read the elements 4, 6, 5. These three elements organized vertically are the first column
of the matrix A. The three elements to the right of this first column, 6, 0, and −8 are the
second column. Finally, the last three elements to the right of the second column, 9, −2,
and 1 are the third column of the matrix.
6.4.1 Representations
In this Subsection, we will introduce the dimensions of a matrix and discuss several dif-
ferent ways of defining and describing matrices. They are useful in different settings; it
is important to realize that the same object can have several different equivalent descrip-
tions.
6.4.1.1 Dimension
Matrices can be large or small. The smallest one (in terms of dimensions) is the scalar.
Sometimes it pays to look at scalars as matrices. As a matrix, the scalar α could be written
as:
[α]. (6.4.3)
We seldom write a scalar as in (6.4.3), but here it is useful just to note that the scalar is a
matrix with a single row and a single column. We say a scalar is a 1 × 1 matrix, or has
dimensions 1 × 1, or has dimension one.
Beyond scalars, Section 6.2 introduced vectors. We saw in Section 6.2 that nD-vectors can
be of two types – row vectors and column vectors. As matrices, nD-row vectors have
dimension 1 × n, i.e., they are a matrix with a single row and n columns. Likewise, mD-
column vectors are matrices with dimensions m × 1, i.e., matrices with m rows and a
single column.
More generally, if the matrix A has m rows and n columns, the dimensions of A are:
A:m×n
The dimensions of A are read “m times n.” We emphasize that the first number m indi-
cates the number of rows in the matrix and the second number n indicates the number of
columns of the matrix. The matrix A in (6.4.2) has dimensions 3 × 3 with m = 3 rows and
n = 3 columns.
The next three Subsections consider three alternative representations of matrices. We start
with the scalar representation, the most common one and the easiest one since it explicitly
shows the array format with which we introduced matrices.
The entries of the matrix are scalars. We will refer to (6.4.4), when needed, as the scalar
representation of the matrix A.
In the m × n matrix A in (6.4.4) there are mn entries. The vertical and horizontal dots
represent unspecified elements, rows, and columns of the matrix A.
The elements of the matrix are indexed by two indices. For example, the second entry be-
low the top left entry, the entry, a21 , is indexed by the two indices 2 and 1. The first index,
2, indicates the index of the row, in this case, the second row. Unless otherwise stated,
the rows are labeled increasingly from top to bottom. The second index, 1, indicates the
index of the column, in this case, the first column; unless otherwise stated, the columns
are labeled in increasing order from left to right. Usually, we start the indexing of rows
and columns with the number 1. On occasion, it is more suggestive to start the indexing
of both rows and columns from 0.
The generic entry aij is the entry at the crossing of row i and column j; again, the sub-
scripts i and j in the generic element aij of the matrix A denote, respectively, the row
and column position of the entry aij –they are often referred to as the row index and the
306 Vectors and Vector Calculus
In (6.4.4), the matrix A is written explicitly by listing exhaustively all the elements of the
matrix. We can write it more compactly once we recognize the generic element aij of the
matrix as:
A = [aij ] 1≤i≤m, 1≤j≤n
or simply
A = [aij ],
where the dimensions m × n of the matrix A are assumed known.
Matrix values As for vectors, the entries aij of a matrix A may be in Z, Q, R, C, or may
be generic variables, known or unknown, functions, or sequences.
Each of the columns can of course be identified as a column vector. For our
example, we have the three column vectors:
4 6 9
a1 = 6 , a2 = 0 , a3 = −2
5 −8 1
6.4 Matrices 307
We can use these vectors to write the matrix A more compactly as:
A = a1 a2 a3
The column vectors in A, namely, a1 , a2 , and a3 have dimension 3 × 1 in this
example, or simply 3.
We now consider the general column vector representation of a matrix. Let the matrix A
with (scalar) representation and dimensions be as follows,
A = [aij ] : mA × nA (6.4.5)
It is important to realize what the dimensions of the matrix tell about its structure, as
we now discuss. For example, from the dimensions of the matrix we can realize that the
matrix has nA column vectors and each column vector is of dimension mA . The column
vectors of the matrix A are:
a11 a1j a1nA
a21 a2j a2n
A
a1 = .. , · · · , aj = .. , · · · , anA = ..
. . .
amA 1 amA j amA nA
We can then get the column representation of the matrix A as:
A = [a1 · · · aj · · · anA ] (6.4.6)
bT1 = 4 6 9
bT2 = 6 0 −2
bT3 = 9 −8 1
We can now write the matrix A more compactly using its rows. We get for our
example:
T
b1
A = bT2
bT3
Note that the row vectors are stacked one below the other, not side by side like
in the column representation. We also know that each row vector is dimen-
sion 3 since the matrix has 3 columns.
We now consider the general case. The matrix A is given in Equation (6.4.5) and we
assume it is real valued. It is of dimensions mA × nA . From this we can conclude it has
T
mA rows, each row of dimension nA . Let the rows of A be f1T , · · · , fm A
:
f1T
..
.
A = fiT (6.4.12)
.
..
T
fm A
6.4 Matrices 309
Remark 6.4.1 (Row vector representation of complex matrices). In (6.4.7) the row vec-
tors of A are represented as the transpose of column vectors fi , so that the row vector repre-
sentation of A is expressed in terms of fiT . When the matrix is complex valued, it is more
common to define the rows of A as the Hermitian of column vectors. Then, the row represen-
tation of A is usually written as:
f1H
..
.
A = fiH . (6.4.13)
.
..
H
fm A
Unless otherwise specified, we will consider that the matrices are real valued and work with
the row representation in (6.4.12) rather than in (6.4.13). However, whenever the matrix is
complex valued, we should represent the row representation by (6.4.13).
Note that we refer to the blocks using our common notation: matrices are cap-
ital bold faced roman letters; column vectors are lower case bold faced roman
letters; and scalars are lower case letters. We also chose to represent the third
block, a row vector, as the transpose of a column vector. These notations are a
matter of choice; we use them simply for consistency.
310 Vectors and Vector Calculus
General case: block matrix representation We consider a general block matrix form by ex-
tending in a straightforward way Example 6.4.3. The matrix A is in block form:
A11 A12 · · · A1`
A = ... .. .. .. (6.4.14)
. . .
Ak1 Ak2 · · · Ak`
The matrix has k` blocks and these blocks have to have consistent dimensions: the blocks
in block row i all have to have the same number of rows, say mi ; and the blocks in block
column j all have to have the same number of columns, say nj . So, we have that the block
entry Aij has the following dimensions:
Aij : mi × nj .
6.4.2 Examples
We consider a few examples of matrices. The zero and identity matrices are two examples
of very important matrices in applications. We will also discuss the Fourier matrix as
an example of a matrix with complex entries, rotation matrices, matrices of functions
whose entries are functions of a variable, for example, time, and polynomial and rational
matrices.
Iij = δij , 1 ≤ i, j ≤ n,
where the symbol δij is the Kronecker symbol that is equal to one if i = j and
is equal to zero if i 6= j. In words, the elements of the identity matrix I are zero,
except the n elements with the same row and column indices, in which case
the element is one.
For example, the identity matrix with 4 rows and 4 columns, i.e., the 4 × 4
identity matrix I4 , is given by:
1 0 0 0
0 1 0 0
I4 =
0
.
0 1 0
0 0 0 1
1 h 2π i
Fn = √ e−j n k` , k, ` = 0, · · · , n − 1. (6.4.15)
n
312 Vectors and Vector Calculus
This expression shows that the generic element k` of Fn is the complex expo-
nential
1 2π
Fk` = √ e−j n k` ,
n
where 0 ≤ k, ` ≤ (n − 1).
The quantities
2π
Ω` = ± `, ` = 0, · · · , n − 1
n
Note that the factor √1 is for normalization purposes; not all authors include it.
n
1 h 2π i
F2 = √ e−j 2 k` , k, l = 0, 1
2
1 1 1
=√ −jπ
2 1e
1 1 1
=√ .
2 1 −1
1 h 2π i
F4 = √ e−j 4 k` , k, l = 0, · · · , 3
4
1 1 1 1
−j 2π −j2 2π −j3 2π
1 1 e
4 e 4 e 4
= 2π 2π 2π
2 1 e−j2 4 e−j4 4 e−j6 4
2π 2π 2π
1 e−j3 4 e−j6 4 e−j9 4
1 1 1 1
1 1 −j −1 j
= .
2 1 −1 1 −1
1 j −1 −j
An important parametric matrix, i.e., where the entries are in terms of a pa-
rameter θ, is the rotation matrix on the plane. This is a 2 × 2 matrix R:
cos θ − sin θ
R(θ) = .
sin θ cos θ
π
For example, the matrix that rotates counterclockwise a vector by 3
is:
" √ #
π 1 3
−
R = √23 1 2 .
3 2 2
This matrix reflects vectors on the plane about the line at angle θ.
functions, see also Chapter 5, Section 5.6, on system functions and transfer func-
tions. These are matrices whose entries are polynomial or rational functions,
see Chapter 2 on rational functions. An example, of a polynomial matrix in the
complex variable z is:
z − 1 z 3 − 2z 2 + z − 1 z 2 − 2z + 3
H(z) = 2 , z ∈ C.
z − 1 (z 2 − 5)(z 3 − 1) z 2 − 2z + 11
An example of a rational matrix in the complex variable z is the following 2 × 3
matrix:
" 2 2
#
z z +1 z −2z+3
H(z) = z−1 z 3 −2z 2 +z−1 z 3 +3z−2 , z ∈ D ⊂ C, (6.4.16)
2 2
z −5 z+3
z 2 −1 z 3 −1 z 2 −2z+11
Remark 6.4.2 (Region of convergence). In matrix (6.4.16), the entries are functions in the
complex variable z. In fact, and on purpose, we chose these functions to be rational functions
of z. Note that to define the matrix H(z) we need not only the expression as given above but
to define a domain D ⊂ C where it converges. This domain for matrices of rational functions
are usually insides of circles, or outsides of circles, or disks (regions between circles). Or, in
alternative, a half-left plane (the plane to the left of a vertical axis on the complex-plane), or
a half-right plane (the region of the plane to the right of a vertical axis), or the region in the
plane between two vertical axes. This we will not address here since it would take us too far
afield. We just state the warning. This topic is discussed in detail in areas like Controls or
Signal Processing.
where A is n × n.
The upper triangular part of a n-dimension square matrix are the elements
above the main diagonal of the matrix. The lower triangular part of a n-
dimension square matrix are the elements below the main diagonal.
In many applications, matrices are not only rectangular, but one of the dimen-
sions is much larger than the other dimension. The terminology for such ma-
trices is suggestive. If m n, the matrix A is called a tall rectangular matrix,
316 Vectors and Vector Calculus
illustrated as follows.
A = m × n
A= m×n
0 0 · · · 0 dnn
We can write it in more compact notation as:
D = diag[d11 · · · dnn ]. (6.4.18)
The writing in (6.4.18) is shorthand for (6.4.17).
Given a vector d:
d1
d = ...
dn
6.4 Matrices 317
the diagonal matrix of the vector is the diagonal matrix whose diagonal entries
are the elements of the vector d; it is represented as
d1
d2 0
.
..
D = diag(d) =
. ..
0 dn−1
dn
The symbol 0 above the diagonal means that all entries above the diagonal are
zero. The same applies for the symbol 0 below the diagonal.
The ∗ in the upper triangular part of the matrix means that we do not care
about the actual values of the entries, they may be arbitrary. The 0 in the lower
triangular part means that all entries below the diagonal are zero.
To fix notation, we consider the matrices A and B with the scalar representations and
dimensions as follows,
A = [aij ] : mA × nA (6.5.1)
B = [bij ] : mB × nB (6.5.2)
We also write their decompositions in columns, rows, and blocks. The column represen-
tations of A and B are:
A = [a1 · · · anA ] (6.5.3)
B = [b1 · · · bnB ] (6.5.4)
The nA columns a1 , · · · , anA of the matrix A in its column representation (6.5.3) are of
course column vectors of dimension mA . Similarly for B.
6.5 Calculus with Matrices 319
You should get used to figure out from the context if a vector is a column vector or a row
vector.
We now consider several operations we can perform with matrices. We will express them
using the scalar entries representation of a matrix, as well as the column and the row
representations of the matrix.
6.5.1 Equality
We investigate what equality of matrices means:
A=B
Matrix equality is interpreted as entrywise equality. Therefore, matrix equality requires
the same number of columns and number of rows, as well as the same number of entries
in each matrix:
mA = mB = m
nA = nB = n.
320 Vectors and Vector Calculus
Given that the matrices have the same dimensions, the number of entries is the same
mA nA = mB nB = mn.
The equality of matrices is expressed in terms of the scalar entries of the matrices.
aj = bj , 1 ≤ j ≤ n.
fiT = giT , 1 ≤ i ≤ m.
Finally, with the block representation, if both matrices A and B have the same number of
blocks and (consistent) block decomposition, if we take in (6.5.7) and (6.5.8) m = k and
n = `, equality of the matrices requires:
Aij = Bij , 1 ≤ i ≤ k, 1 ≤ j ≤ `.
6.5.2 Addition
We consider the addition of two matrices A and B. Addition of matrices is defined entry-
wise as we see in this Section.
Let C be the sum of the two matrices A and B. We investigate the conditions when it is
possible to add the two matrices and how to compute their sum.
We consider the scalar, column, and row representations of the two matrices A and B
given by (6.5.1)–(6.5.6).
To be possible to add A and B, we need the dimensions of the two matrices A and B to
be the same, i.e., to satisfy the following relations:
mA = mB = m
nA = nB = n,
i.e., A and B have the same number of rows m and the same number of columns n.
Assuming they have the same dimensions, let the matrix C be the addition of A and B:
C = A + B : mC × nC
6.5 Calculus with Matrices 321
The elements of the matrix C, the sum of the two matrices A and B, are given by the
entrywise sum:
cij = aij + bij , 1 ≤ i ≤ m, 1 ≤ j ≤ n.
From here we confirm that the number of rows and number of columns in A, B, and C
must be:
mC = mA = mB = m
nC = nA = nB = n.
In other words, to add two matrices, the matrices need to be of the same dimensions
m × n, and the resulting matrix is of the same dimension m × n.
In terms of the column representation, the columns of the sum matrix C are:
cj = aj + bj , 1 ≤ j ≤ nC = n.
C = [c1 · · · cnC ],
= a1 + b1 · · · an + bn
where of course nC = nA = nB = n.
Finally, in terms of block representation, the addition of the two matrices A and B with
the decompositions (6.5.7)–(6.5.8) is the matrix C with block representation consistent
with the block representations of A and B, whose ij block Cij is given by:
and
10 11 14
12 13 15
B=
16 17 18
Then C is given by
C=A+B
12 10 11 5 14
3 4 + 12 13 6
+
15
=
,
7 8 + 16 17 9 + 18
324 Vectors and Vector Calculus
C=A+B
11 13 19
15 17 21
=
,
23 25 27
Before leaving this Subsection, we remark again that, while adding scalars is always pos-
sible, addition of matrices is not always possible; we can add square or rectangular ma-
trices, but the matrices need to have the same dimensions.
A + B + C = A + (B + C) = (A + B) + C.
A + B = B + A.
Zero matrix: Unit with respect to addition There is a matrix, the zero matrix 0, that is
the identity element of the addition of matrices:
A + 0 = 0 + A.
Inverse with respect to addition For every matrix A, there is a matrix B such that the
sum with the original matrix A is the zero matrix. The matrix B is the negative
of A, so that:
A + (−A) = 0.
For example if
12
A= .
34
Then the negative of A is given by:
−1 −2
−A = .
−3 −4
Clearly the addition of these two matrices is the 2 × 2 zero matrix.
The properties of addition of matrices follow trivially from the corresponding properties
of the addition of scalars, since addition of matrices when defined is obtained from the
addition of their entries.
6.5.3 Product
We consider multiplication of two matrices, including the special case of multiplication
of a matrix by a scalar, the product of two vectors that we did not introduce in Section 6.2,
and then the product of general matrices. We will learn several important things. The
first is that, like with addition of matrices, multiplication of matrices only exists in very
special circumstances that we will need to examine carefully. The second is that given
two matrices A and B we may be able to multiply A by B on the right, i.e., to compute
the product AB, but we may not be able to multiply A by B on the left, i.e., to compute
the product BA. So, now order matters (more on this later), and the product of matrices
may not be commutative, in general. Finally, when we can multiply the two matrices A
and B, the dimensions of the resulting matrix C have to be carefully determined from the
dimensions of A and B.
We start with multiplication of a matrix by a scalar and then vector multiplication (row
vector by column vector and column vector by row vector). Only then, we will address
the general case.
C = αA
= [αaij ], 1 ≤ i ≤ mA , 1 ≤ j ≤ nA
αa11 · · · αa1nA
αa21 · · · αa2n
A
= .. .
.. ..
. . .
αamA 1 · · · αamA nA
In terms of the row and column representations, the rows and columns of A are simply
multiplied by the scalar α. For example, the row representation is:
αf1T
αA = ... (6.5.12)
T
αfmA
Finally, for a matrix A in block form (6.5.7), the multiplication by a scalar α simply multi-
plies each block Aij by the scalar to obtain the block αAij .
A11 A12 · · · A1`
αA = α ... .. .. .. (6.5.14)
. . .
Ak1 Ak2 · · · Ak`
αA11 αA12 · · · αA1`
= ... .. .. .. . (6.5.15)
. . .
αAk1 αAk2 · · · αAk`
6.5.3.2 Row vector by column vector: Scalar product
We begin by the simplest case of multiplication of a row vector aT by a column vector b.
Remark 6.5.1 (Scalar product). In this Subsection we work with the row vector aT that is
the transpose of the column vector a. The reason for writing the row vector as the transpose
of a column vector is because the operation of multiplying the row vector aT with the column
vector b is also known as the scalar product of the two column vectors a and b. The scalar
product is the product of two vectors and not the product of a vector by a scalar. The scalar
product of two vectors is also known as the inner product, the dot product, or the internal
product of the vectors. We will come back to the scalar product in Chapter 8.
6.5 Calculus with Matrices 327
The next Example illustrates how to multiply a row vector by a column vector with a 3-
dimensional example.
We first multiply pointwise the entries of the row vector aT with the corre-
sponding entries of the column vector b, and then we accumulate the elemen-
twise products so obtained. This is illustrated below:
T
4
a · b = 1 2 3 5
6
=1·4+2·5+3·6
= 4 + 10 + 18
= 32.
The result of the scalar product of a 3D row vector by a 3D column vector is a
scalar. This justifies the name of the product.
We were able to multiply the row vector aT by the column vector b because the
number of columns of aT , which is 3, is the same as the number of rows of the
column vector b, which is 3. The result is the scalar c = 32.
b1
b2
b=
· · · .
bn
The multiplication of aT by b is the scalar c:
c = aT b
b1
b2
= a1 a2 · · · an
···
bn
= a1 b 1 + · · · an b n
Xn
= ai b i . (6.5.16)
i=1
Remark 6.5.2 (Outer product). It is important to note again that in this Subsection we also
work with the row vector aT ; like before, we emphasize that this row vector is the transpose of
the column vector a. The reason to use this notation is because the operation of multiplying
the column vector b by the row vector aT is also known as the outer product of the two column
vectors b and a. We will come back to the outer product at a later Chapter.
The next Example illustrates how to multiply a column vector by a row vector with a 3-
dimensional example.
4×1 4×2 4×3
= 5 × 1 5 × 2 5 × 3
6×1 6×2 6×3
4 8 12
= 5 10 15
6 12 18
The result of the outer product of a 3D column vector by a 3D row vector is a
3 × 3 matrix.
The second equation tells us how we computed it. We multiplied the first ele-
ment, i.e., the first row, of b, which is 4, by the first element, i.e., the first col-
umn, of the vector aT , which is 1, and placed the result as the element c11 = 4
of the resulting matrix C. Note the indices (1, 1) of c11 , the first 1 goes with
the first row of the vector b that we are using in computing it, and the second
index 1 goes with the first column of the entry of vector aT used to compute c11 .
This is a mb × na matrix C.
As just seen, multiplying the column vector b by the row vector aT (column vector times
row vector) is ALWAYS possible because the number of columns of b is 1 and equals the
number of rows of aT , which is also 1. The result is a matrix with dimensions mb × na .
This contrasts with the multiplication of a row vector aT by a column vector b (row vector
330 Vectors and Vector Calculus
times column vector) that is possible only when the number of columns of aT is equal to
the number of rows of b, leading to a scalar.
In neither case is the result a vector! In the first example it is a scalar. In the second
example, it is a matrix – no vector to be seen.
We will consider the matrix-matrix product in scalar, row, column, and block representa-
tions.
Matrix product: scalar representation Like in the previous Subsections, we start with
an example.
dimensions are the same. Figure 6.5 illustrates these facts, as well as the dimensions
of the resulting matrix C, with the two matrices displayed.
We collect from these two numerical examples and for the record the conditions under
which we can multiply two matrices:
A · |{z}
C = |{z} B
mA ×nA mB ×nB
We check that
(no. of columns of A) nA = mB (no. of rows of B)
If this is true then we can multiply the two matrices and the dimensions of the resulting
matrix C are:
C : mA × nB
i.e., the number of rows mC of C is the number of rows mA of A, and the number of
columns nC of C is the number of columns nB of B.
332 Vectors and Vector Calculus
We now state the general rule to multiply two matrices in scalar representation.
C=A·B (6.5.21)
Pn Pn
`=1a1` b`1 ··· `=1 a1` b`nB
.. .. ..
= : mA × nB (6.5.22)
Pn . . Pn .
`=1 amA ` b`1 · · · `=1 amA ` b`nB
aT1 = [1 2 3]
aT2 = [4 5 6]
General case: Matrix (in row representation) times matrix (in column representation).
We indicate the general case. Consider the multiplication of the two matrices:
C = AB (6.5.26)
aT1
aT
2
= .. b1 b2 · · · bnB . (6.5.27)
.
aTmA
Performing the calculations, we get:
aT1 b1 aT1 b2 · · · aT1 bnB
aT b1 aT b2 · · · aT1 bnB
2 2
C = .. . (6.5.28)
.. .. ..
. . . .
aTmA b1 aTmA b2 T
· · · amA bnB
334 Vectors and Vector Calculus
We now consider the matrix A in column representation and the matrix B in row repre-
sentation.
g1T = [7 10 13]
g2T = [8 11 14]
g3T = [9 12 15]
We now write the multiplication of the matrices A and B using the column
representation of A and the row representation of B:
T
g1T
AB = f1 f2 f3 g2 (6.5.31)
g3T
= f1 g1T + f2 g2T + f3 g3T (6.5.32)
1 2 3
= [7 10 13] + [8 11 14] + [9 12 15]
4 5 6
7 10 13 16 22 28 27 36 45
= + + (6.5.33)
28 40 52 40 55 70 54 72 90
7 + 16 + 27 10 + 22 + 36 13 + 28 + 45
=
28 + 40 + 54 40 + 55 + 72 52 + 70 + 90
50 68 86
= (6.5.34)
122 167 212
6.5 Calculus with Matrices 335
Of course the result in (6.5.34) is the same as the result we obtained in (6.5.20).
Each of the terms in (6.5.33) is a matrix and the product of the matrices using
the column representation of A and the row representation of B leads to the
multiplication being given as the sum of three matrices. The number three is
exactly the number of columns of A that equals the number of rows of B, i.e.,
we can multiply the matrices because:
nA = mB = 3. (6.5.35)
This condition (6.5.35) is the same as we obtained before in (6.5.25), as we
should expect.
General rule: Matrix (in column representation) times matrix (in row representation).
The general rule to multiply two matrices, the first in column representation and the
second in row representation is:
C = AB
bT1
bT2
= a1 a2 · · · anA
..
.
bTmB
= a1 bT1 + a2 bT2 + · · · + anA bTmB (6.5.36)
with nA = mB . Each of the terms in the sum is a matrix with dimensions mA × nB .
This result (6.5.36) follows easily from (6.5.21) as we show now. Repeat Equation (6.5.21)
to obtain successively:
C=A·B (6.5.37)
b11 · · · b1nB
a11 · · · a1` · · · a1nA ··· ··· ···
.. .. .. .. .. ·
= . . . . . b`1 · · · b`nB (6.5.38)
amA 1 · · · amA ` · · · amA nA ··· ··· ···
bmB 1 · · · bmB nB
Pn Pn
`=1 a1` b`1 ··· a1` b`nB
`=1
=
.. .. ..
(6.5.39)
Pn . Pn. .
`=1 amA ` b`1 · · · `=1 amA ` b`nB
n
X .a 1` b`1 · · · a 1` b`nB
.. ..
= .. (6.5.40)
. .
`=1 amA ` b`1 · · · amA ` b`nB
336 Vectors and Vector Calculus
n a1`
X ..
= . b`1 · · · b`nB (6.5.41)
`=1 a
mA `
Matrix product: block representation We now consider the product of two matrices A
and B in block form. This is trickier than the previous three cases and care should be
taken to make sure that the block decompositions of A and B allow the product of the
indicated subblocks. This is best illustrated working with specific cases.
A = diag[A11 · · · Ann ]
B = diag[B11 · · · Bnn ]
C=A·B
diag[A11 · B11 · · · Ann · Bnn ].
The dimensions of the square blocks Aii are not constrained by the dimen-
sions of other diagonal square blocks of A, and, likewise, the dimensions of
the square blocks Bii are not constrained by the dimensions of other diago-
nal square blocks of B. However, the dimensions of corresponding blocks Aii
and Bii are the same. The dimensions of the square blocks Cii are the dimen-
sions of the blocks Aii and Bii .
The question here is not how to multiply the two matrices. This we have learned in the
previous Subsections. The issue is to perform this product in such a way to exhibit a new
structure of the product matrix. This may be useful in applications.
Remark 6.5.4 (Matrix-vector products). Because we usually assume the vectors to be col-
umn vectors, when we refer to matrix vector products, or a matrix multiplying a vector,
we assume implicitly that the matrix multiplies the vector on the left. Unless the context
states otherwise, or we mention it explicitly, matrix-vector product will assume the matrix
multiplies the vector on the left.
We give next three examples that are important in applications that illustrate the use of
these products.
Example 6.5.13 I Linear combination revisited: Matrix-vector product
The first example of the product of a matrix A in column representation by a
matrix B in row representation is when B is a column vector b.
Let matrix A be given in vector form and the (column) vector b, where we
assume that the number of columns of A, nA , and the number of entries of b,
mb , are equal, nA = mb :
C = Ab
b1
b2
= f1 f2 · · · fnA (6.5.42)
..
.
bmB
= b1 f1 + b2 f2 + · · · + bmB fnA . (6.5.43)
If we recall the linear combination of vectors given by (6.3.1) in Section 6.3.5, we
recognize that (6.5.43) is the linear combination of the columns f1 , f2 , · · · , fnA ,
of A. We can state this in a different way, by interpreting a linear combination
of vectors as the product of a matrix, whose columns are the vectors, by a (col-
umn) vector whose entries are the coefficients of the linear combination.
Once again, we encounter the case where the same object is interpreted in dif-
ferent ways. We should get used to this–looking at similar objects from differ-
ent perspectives.
6.5 Calculus with Matrices 339
We now consider the product of the matrices A and B, where B is in column vector form.
We now consider the same product of the matrices A and B, but now A is in row vector
form.
Computing powers of square matrices from the definition is then, in general, computationally
heavy. We will see in due time speedier ways to compute successive powers of a matrix.
If both matrices are square and have the same dimensions, the product AB and the prod-
uct BA are both well defined, but in general the resulting matrices are different.
These examples illustrate that the properties of matrix multiplication may not enjoy the
same properties of multiplication of numbers (scalars). We collect here the main proper-
ties of matrix multiplication.
Associativity It is associative:
A · B · C = (A · B) · C = A · (B · C)
Multiplication by zero matrix 0 . A · 0 = 0, but the product of two matrices can be zero
without either factor being a zero matrix, as for example in the following:
11 1 −1 00
= .
1 1 −1 1 00
α(A + B) = αA + αB.
Distributivity of sum of scalars and product by matrix We have, for α and β scalars:
(α + β)A = αA + βA.
A · (B + C) = A · B + A · C
(A + B) · C = A · C + B · C
as long as the dimensions are appropriate so that all indicated products are valid.
342 Vectors and Vector Calculus
A · In = A
In · A = A
Identity The identity matrix I is the identity element of the multiplication of square ma-
trices.
A·B=I (6.5.48)
B·A=I (6.5.49)
If such B exists (and it might not), then B is the inverse of A. It is represented by A−1 .
Result 6.5.1 (Matrix inverse is unique). The inverse A−1 of a matrix A when it
exists is unique.
Proof I The proof is simple. Let B and C be two inverses of the matrix A and I the
identity matrix. Then
B = IB
= (CA)B
= C(AB)
= C.
In Section 6.6.3, we consider when the inverse of a square matrix exists and, if it
does exist, how to compute it.
The next result shows that the inverse of the inverse is the original matrix.
Result 6.5.2 (Inverse of inverse). The inverse of the inverse A−1 of a matrix A is the
original matrix A.
A−1 A = I
6.5 Calculus with Matrices 343
AA−1 = I.
Except for the existence of the inverse of a square matrix, all the other properties listed
above are straightforward to verify and follow from the corresponding properties for
products of scalars.
6.5.4.1 Conjugation
Given A, possibly with complex valued entries, its conjugate, represented by A∗ , is the
matrix whose entries are the complex conjugates of the entries of A, i.e.:
A∗ = a∗ij
where A = [aij ]. The dimensions of A∗ and A are the same. If the entries of a matrix are
real valued, then conjugation does not affect the matrix, leaving it invariant
A = A∗
whenever A is real valued (this is short hand to say that the entries of A are real valued).
B = [bij ] = [aji ].
mB = nA and nB = mA .
344 Vectors and Vector Calculus
As we saw in Section 6.2, the transpose of a row vector becomes a column vector:
a1
a2
aT = [a1 a2 · · · ana ]T =
· · · .
ana
Likewise, the transpose of a column vector becomes a row vector:
T
a1
a2
aT =
· · · = [a1 a2 · · · ama ].
ama
Transposition of a matrix in row representation. We transpose A : mA × nA in row format:
f1T
A = ... .
T
fm A
Then
AT = f1 · · · fmA .
We assume that the matrices have compatible dimensions, so all products are well defined.
Note that the first line of the equation defines the Π notation.
Step 1: We prove first for the transpose of the product of two matrices.
For ease of notation, we consider the two matrices A and B, which are mA × nA and
mB × nB . We assume that nA = mB . Write the row and column representations of A
and B:
T
a1
A = ···
aTmA
B = [b1 · · · bnB ].
C = AB
is:
cij = aTi bj .
D = CT
is
dji = cij
= aTi bj
= bTj ai .
The last equation follows from the second because the transpose of a scalar is the same
scalar. But, the last equation is simply:
D = CT
346 Vectors and Vector Calculus
bT1
= · · · [a1 · · · amA ]
bTnB
= BT AT ,
as desired. This proves that the transpose of the product of two matrices is the product of
the transposed matrices by reverse order.
Step 2: The induction step assumes that the Result is true for the product of n−1 matrices:
Step 3: We now prove for the product of n matrices. From associativity of the product of
matrices and from the transpose of the product of two matrices:
as we needed to prove.
A = AT .
aT = a.
Skew symmetric matrices. A matrix A is said to be skew symmetric if it equals the negative
of its transpose:
A = −AT .
A skew symmetric matrix is square; this is proven by an argument similar to showing
that a symmetric matrix is square. Ir follows also that the diagonal elements of a skew
symmetric matrix are zero.
Orthogonal matrix A square matrix A is an orthogonal matrix if its inverse is its transpose,
i.e.:
A T · A = A · AT = I
where I is the identity matrix.
Recall from Example 6.4.7 the rotation matrix R(θ). For this matrix, we
get:
T
T cos θ sin θ cos θ sin θ
R(θ) · R(θ) = ·
− sin θ cos θ − sin θ cos θ
cos θ sin θ cos θ − sin θ
= ·
− sin θ cos θ sin θ cos θ
= I2 ,
Unitary matrix A square matrix A is a unitary matrix if its inverse is its Hermitian, i.e.:
A H · A = A · AH = I
= I4 .
It is worth noting that the Fourier matrix is unitary as shown and sym-
metric, but NOT Hermitian.
6.5.6 Limits, derivatives, integration, delay, and Taylor series with ma-
trices
Just like we did with vectors, we can consider sophisticated operations with matrices like
limits, differentiation, integration, delay, and Taylor series. These are all defined entry-
wise. We consider these very briefly here, since it is a straightforward extension of the
concepts for vectors in Section 6.3.6.
6.5.6.1 Limit
Consider the m × n matrix (of functions) A(t), t ∈ T ⊂ R. Let t0 ∈ T . The limit of the
matrix of functions is defined entrywise and given by
A11 (t) · · · A1n (t)
lim A(t) = lim ... .. ..
t→t0 t→t0
. .
Am1 (t) · · · Amn (t)
limt→t0 A11 (t) · · · limt→t0 A1n (t)
= .. .. ..
.
. . .
limt→t0 Am1 (t) · · · limt→t0 Amn (t)
The limit of the matrix of functions is the matrix of the limits of the functions in each entry
of the matrix. To prove this result, we need to introduce concepts that we do not have yet
available like distance between matrices that formalize the notion of matrices being close
to each other.
Then,
1 21
= .
2 76
dt dt
−2t −t
4e − 3e−t
−2t
1 −2e − e
= .
2 −8e−2t − 3e−t −2e−2t − 5e−t
R tf R tf
ti
A11 (t) dt ··· ti
A1n (t) dt
= .. .. ..
.
. . .
R tf R tf
ti
Am1 (t) dt · · · ti
Amn (t) dt
The advance or delay of a matrix of sequences is also an operation that is applied entry-
wise. For example, let:
A11 [k + 1] · · · A1n [k + 1]
A[k + 1] = .. .. ..
.
. . .
Am1 [k + 1] · · · Amn [k + 1]
We can define the Taylor series of a matrix of functions, again, entrywise. Since it is a
straightforward extension of the Taylor series of vectors, we refer to Section 6.3.6.5 for
details.
From this matrix we can obtain other matrices by discarding rows or columns of both.
These so obtained matrices are called submatrices of the original matrix. For example, if
we eliminate row 2 and column 1, we get the 2 × 2 submatrix:
23
A= .
89
|A| or det A.
Note that the notation | · | for the determinant of a matrix is the same notation that we
used to indicate the magnitude or absolute value of a scalar. Even though the notation
is the same, the two concepts are very different and should not be confused. The context
should disambiguate which one is meant.
Definition 6.6.1 is recursive, because the determinant of order n, i.e., of a square matrix
of dimension n, is expressed in terms of determinants of order n − 1. We also see that
eventually the determinant of order 2 is expressed in terms of determinants of order 1,
i.e., in terms of scalars.
Minor and cofactor. The determinant Mij in (6.6.2) is called the minor associated with the
element aij in the matrix A. The quantity Aij given in (6.6.2) is the cofactor of the element
aij in the matrix A.
In Equation (6.6.1), we expanded the determinant in terms of row i. We could have used
any other row. We could have also alternatively defined the determinant by an expansion
in terms of a column of the matrix. We can have then any of the following 2n possible
expressions for the determinant:
n
X
|A| = aij Aij , 1 ≤ i ≤ n (6.6.3)
j=1
n
X
= (−1)i+j aij Mij , 1 ≤ i ≤ n (6.6.4)
j=1
or
n
X
|A| = aij Aij , 1 ≤ j ≤ n (6.6.5)
i=1
Xn
= (−1)i+j aij Mij , 1 ≤ j ≤ n. (6.6.6)
i=1
The important point is that all these 2n expressions are equivalent and lead to the same
value for the determinant. This provides opportunity to choose the expansion that is the
simplest to compute. We will not prove this.
We first compute the minors and cofactors associated with the first row:
M11 = a22
A11 = (−1)1+1 × a22 = a22
M12 = a21
354 Vectors and Vector Calculus
It is left as an exercise that we get the same expression if we expand the de-
terminant by the second row, or by the first column, or by the second column.
We first compute the minors and cofactors associated with the first row:
a22 a23
M11 =
a32 a33
= a22 a33 − a23 a32
A11 = (−1)1+1 × M11 = a22 a33 − a23 a32
a21 a23
M12 =
a31 a33
= a21 a33 − a23 a31
A12 = (−1)1+2 × M12 = −(a21 a33 − a23 a31 )
a21 a22
M13 =
a31 a32
= a21 a32 − a22 a31
A13 = (−1)1+3 × M13 = a21 a32 − a22 a31
|A| = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
= a11 a22 a33 + a12 a23 a31 + a21 a32 a13 − a13 a22 a31 − a21 a12 a33 − a11 a23 a32 .
6.6 Functions of Matrices 355
This is also highly mnemonic. The determinant of a 3 × 3 has six terms: the
product of the diagonal elements a11 a22 a33 , plus the product of the elements of
the first upper diagonal (the diagonal immediately above the main diagonal)
times the element on the left lower corner a12 a23 a31 , plus the product of the
elements in the first lower diagonal times the element in the top right corner
a21 a32 a13 , minus the product of the elements on the counter diagonal a13 a22 a31 ,
minus the product of the elements in the first upper counter diagonal times the
element in the lower right corner a21 a12 a33 , minus the product of the elements
in the first lower counter diagonal times the first element a11 a23 a32 .
It is left as an exercise to show that one gets the same expression if we expand
the determinant using the second or third row, or the first, second, or third
column.
Definition 6.6.2 (Singular and non singular matrices). A square matrix A is singular if
its determinant is zero, |A| = 0. A matrix A is non-singular if the determinant is non-zero,
|A| =6 0.
We now study some important properties of determinants. We start with the determinant
of a few structured matrices. These are left as exercises.
Matrix with a row multiplied by a scalar The determinant of the matrix C that is the
matrix A but with row i multiplied by a scalar α is the determinant of A multiplied
by the scalar α:
|C| = α|A|
This is easily seen by expanding the determinant of A by row i, realizing that all
elements of this row are multiplied by α, while the cofactors are left invariant.
|C| = |αA|
= αn |A|,
where n is the dimension of the square matrix A. This can be seen by induction and
using the previous result.
356 Vectors and Vector Calculus
Matrix with zero row or column The determinant of matrix A with a row (or column)
of zeros is zero. This is easily seen by expanding the determinant by the row (or
column) of zeros.
Matrix with 2 rows (2 columns) interchanged Interchange 2 rows or columns, then the
determinant of the matrix is multiplied by −1. The result can be proved by induc-
tion. We will not prove this result.
Matrix with repeated row (or column) The determinant of a matrix A with repeated row
(or column) is zero. This follows from the previous result, since the matrix has a
repeated row, then, by interchanging these two rows, on the one hand, we do not
change the determinant since the matrix remains unaltered, on the other hand, by
the previous result it changed sign. Hence, it is zero.
Matrix where one row multiplied by scalar is added to another If we replace a row by
the row obtained by adding to it another row multiplied by a scalar, the determi-
nant is invariant. A similar result holds if instead we multiply a column by a scalar
and add the product to another column.
To prove this property, consider that we multiply row i by α and add to row j.
Expand the determinant by row j. Then, in the expansion, the elements of row j are
now ajk + αaik , while the cofactors remain the same; the cofactors do not change.
Hence the determinant is now:
X n
(ajk + αaik )Ajk
k=1
= det(A) + αdet(A1 )
where det(A1 ) is the determinant of a matrix with a repeated row, so it is zero.
We sketch the proof. The proof follows by realizing that the column i of AT is row i
of A and similarly row j of AT is column j of A. So, expanding the determinant of
AT by column i is the same as expanding the determinant of A by row i. Note that
the cofactors of each expansion are the same.
Hermitian of a matrix The determinant of the Hermitian AH of a square matrix A is the
conjugate of the determinant of the original matrix:
A = (detA)∗
H
6.6 Functions of Matrices 357
The proof follows because Hermitian is the transpose conjugate. Transposition does
not alter the determinant by the previous property. So, all that is left is conjugation.
Since the determinant is a sum of products of entries of A, the determinant of the
Hermitian of A is the sum of products of the conjugate entries of A; but this sum is
the conjugate of the sum of the same product of entries of A, and so the determinant
of the Hermitian matrix is the conjugate of the original matrix.
Diagonal matrix The determinant of a diagonal matrix is the product of its diagonal en-
tries. Let D = Diag([d11 · · · dnn ]). Then
Triangular matrix Example 6.4.15 introduced upper and lower triangular matrices. It
is easy to show that the determinant of a triangular matrix is the product of its
diagonal entries. We get then that if L is lower triangular its determinant is
These results can be easily proved by induction. We sketch the proof for the lower
triangular matrix L. Expanding the determinant in terms of the last column us-
ing (6.6.5) or (6.6.6), all terms are zero, except the term `nn Lnn where Lnn is the
cofactor of the nn entry of the matrix L. But this cofactor is of order n − 1 and the
corresponding minor is the determinant of a triangular matrix of dimension n − 1.
So, this follows by the induction step.
Product of 2 triangular matrices The determinant of the product of two triangular ma-
trices of the same type (both lower triangular or both upper triangular) is given by
the product of the determinants of each matrix.
We sketch the main steps of the proof for lower triangular matrices. For upper tri-
angular matrices the proof follows similarly.
358 Vectors and Vector Calculus
The product of two lower triangular matrices is lower triangular. The diagonal el-
ements of the product matrix are the product of the corresponding entries of each
lower triangular factor. Then the result follows from the previous result on the de-
terminant of a lower triangular matrix.
Product of two square matrices This result generalizes the previous result to arbitrary
square matrices.
The determinant of the product of two square matrices of the same dimension is
given by the product of the determinants of each matrix.
|AB| = |A||B|.
The proof is easy if we know that we can reduce an arbitrary square matrix to a
triangular matrix by a method that does not change the determinant. This method
is Gauss elimination and will be studied in Chapter 7. So, the proof of the statement
on the determinant of the product of two matrices follows from the result for the
determinant of the product of two triangular matrices, once we reduce each matrix
to a triangular matrix by Gauss elimination.
A = |A|−1 .
−1
(6.6.7)
A−1 A = I.
Computing the determinant of the product on the left-hand-side gives the product
of the determinants. The determinant of the right-hand-side is the determinant of
the identity matrix. This determinant is the product of the diagonal entries, which
are all one, so it is one. Then:
−1 −1
A A = A |A|
=1
from which:
A = |A|−1 .
−1
Since (6.6.7) expresses the determinant of the inverse matrix A−1 in terms of the
inverse of the determinant of the original matrix A, the existence of the inverse A−1
requires that |A| =6 0, i.e., that A is non singular. Result 6.6.1 shows that this is
actually a necessary and sufficient condition for the inverse of a matrix to exist.
6.6 Functions of Matrices 359
|A| = ±1.
AAT = AT A = I.
Then
|A|2 = 1,
and so
|A| = ±1.
|detA| = 1.
In other words, its magnitude1 is one. This follows because from the definition of
unitary matrices
AAH = AH A = I.
Then
det(A)det AH = |det(A)|2
=1
and so
|detA| = 1.
Again, note that | · | represents here the magnitude of the complex number and not
the determinant.
6.6.2 Trace
The trace of a n-dimensional square matrix A is:
n
X
tr A = aii .
i=1
1
Here, | · | refers to the absolute value of the scalar detA. This is why we often prefer the alternative
notation for determinant, namely, det(·).
360 Vectors and Vector Calculus
A number of facts about the trace can be proven easily from the definition.
tr In = n,
tr 0 = 0.
tr A = tr AT ,
Likewise, the trace of the matrix A and the trace of its Hermitian AH are
∗
tr AH = tr AT (6.6.8)
= (tr A)∗ , (6.6.9)
the conjugates of each other, since their diagonal entries are conjugate of each other.
The trace is linear:
tr (α) = α.
tr baT = tr aT b
(6.6.10)
T
= a b,
since aT b is a scalar, and assuming the vectors a and b are of the same dimensions. In
words, (6.6.10) states that the trace of the outer product of two vectors with the same di-
mension (given by the LHS of (6.6.10)) is the scalar product of the vectors (given by the
RHS of (6.6.10)).
Property (6.6.10) for vectors extends to matrices. We get the very interesting property of
the trace:
We assume that A and B have compatible dimensions so both products make sense. This
property can be proved by direct evaluation of the LHS and the RHS. We provide an al-
ternative proof, based on the fact that the trace of the outer product of two vectors with
the same dimensions is the scalar product of the vectors.
By linearity of the trace and the property of the trace of outer product of vectors
n
!
X
tr (A · B) = tr fi giT
i=1
n
X
tr fi giT
=
i=1
n
X
tr giT fi
=
i=1
= tr (BA).
The last step follows directly by computing the diagonal elements of BA with B given in
row format and A given in column format. This proves the result.
The inverse of a n-dimensional square matrix A may not exist; when it exists, as we saw
below (6.5.48), it is commonly represented by the symbol A−1 . We then have, if the inverse
of A exists,
AA−1 = In = A−1 A.
362 Vectors and Vector Calculus
Remark 6.6.1 (Inverse and unit element of matrix multiplication). As we observed, the
inverse A−1 of the n-dimensional square matrix A when it exists plays the role for the prod-
uct of square matrices that the inverse of a scalar plays with respect to the product of scalars.
Their product is the unit element I of multiplication of square matrices.
There are two main differences between the inverse of a matrix and the matrix product on the
one hand, and the inverse of a scalar and scalar multiplication on the other hand: 1) the scalar
inverse is defined for every nonzero scalar, while the inverse A−1 is defined only for square
matrices A with nonzero determinant; and 2) with the scalar multiplicative inverse, we need
checking a single condition, namely, for the nonzero scalar, the product of the scalar and
its inverse is one, while with the matrix multiplicative inverse, assuming the given square
matrix is invertible, to make sure that a matrix B is the inverse of another matrix A we need
to check two conditions, the left product and the right product.
There are several ways to compute the inverse of a matrix. We will discuss one here and
consider another method in Chapter 7.
Result 6.6.1 (Inverse of matrix A). Let the n-dimensional square matrix A
A = [aij ], n×n
1
A−1 = Adj(A) (6.6.12)
det A
where the Adj(A) is the adjoint of A (sometimes called the adjugate matrix of A, because the
term adjoint of a matrix is also used for another purpose). This matrix is given by:
Adj(A) = [Cij ]T
Proof I The proof of this result follows by computing the diagonal elements and the
off-diagonal elements of the left and right multiplication of the matrix A and the inverse
given by (6.6.12). We consider the right multiplication only; the left multiplication follows
6.6 Functions of Matrices 363
similarly.
Consider the element (i, j) of the product AA−1 , where A−1 is given by (6.6.12):
n
−1
X 1
AA ij = ai` [Adj(A)]`j
`=1
det A
n
1 X
= ai` (−1)j+` Aj`
det A `=1
= δij .
In the second equation Aj` is the minor associated with element (j, `). The last equation
follows from the definition of the determinant and corresponding properties. In fact, if
j = i, the sum on the right-hand-side is the det A. If j 6= i, then the sum is the expression
of the determinant of a matrix with a repeated row, namely, row i, which is zero. In both
cases we are assuming that det A 6= 0. Clearly, A−1 given by (6.6.12) is unique.
Result 6.6.2 (Inverse of the matrix A and its determinant). The inverse A−1 of the n-
dimensional square matrix A exists if and only if
det A 6= 0
Proof I The proof of Result 6.6.1 shows that the nonzero condition on the determinant
of the n-dimensional square matrix A is both a necessary and sufficient condition for the
inverse given by (6.6.12) to exist.
Remark 6.6.2 (Inverse and determinant). The nonzero determinant condition for matrices
takes the place of the nonzero condition on a scalar for its inverse to exist.
It is usually computationally expensive to compute directly through Result 6.6.2 the in-
verse of a matrix; for generic n × n matrices, this is a O(n3 ) operation, i.e., requires on the
order of n3 floating point operations (flops). For example, inverting a 1000 × 1000 matrix
would take on the order of 1 Gflops. While any respectable laptop nowadays can deliver
this performance in about 1 second or a fraction of a second, inverting larger matrices
364 Vectors and Vector Calculus
makes it impractical.
We recall a few facts already known about the determinant of structured matrices and
add a few additional facts about their inverses.
D−1 = d−1
ii
U is upper triangular, U−1 ii = u−1
−1
ii .
−1 −1
−1
L is lower triangular, L ii = lii .
In words: The inverses of diagonal, upper triangular, or lower triangular matrices when
they exist, keep their characteristic, i.e., the inverse of a diagonal matrix, upper triangu-
lar matrix, or lower triangular matrix is diagonal, upper triangular, or lower triangular,
respectively, and the diagonal elements of the inverses are the inverses of the correspond-
ing diagonal entries of the original matrix.
Result 6.6.4 (Inverse of transpose of a matrix). If the inverse A−1 exists, then the inverse
of the transpose AT exists and is given by:
−1 T
AT = A−1 .
Proof I The proof follows because since the determinant of the transpose of a matrix AT
is the determinant of the original matrix, then the inverse of AT exists if the inverse of A
exists. Now:
T
AA−1 = I
T
= A−1 AT
The second equation follows from transposing the product of the two matrices in the LHS
of the first equation. Equality of the RHS of the two equations leads to the desired result.
Result 6.6.5 (Inverse of powers of a matrix). If the inverse A−1 exists, then:
n
(An )−1 = A−1 .
Result 6.6.6 (Inverse of product of two square matrices). If A−1 and B−1 exist, then:
Proof I First note that if the inverse of each matrix A and B exists, then the inverse of
their product exists since the determinant of the product is the product of the determi-
nants and this product is non zero since each factor is nonzero.
ABB−1 A−1 = I.
In words, the inverse of the product of two matrices is the product by reverse order of the
inverses of each matrix, assuming that both inverses exist. Again, we should show that
multiplying AB on the left by B−1 A−1 also leads to the identity I.
6.7 Problems
1. Assume the dimensions of the matrices A, B, C, and D are:
Give the dimensions of the following. For each, provide a brief justification.
T
(d) Let the matrix K be given by K = AT E. Specify the values of m and n for
K to be well defined.
(a) Let:
C = AT + B.
D = A · B.
0 0 0 −1
−1 0 0 0
2 1 0 0
L= 1 0 −2 0
2 0 1 −3
Determine which if any of the following matrices is upper triangle, lower triangle,
or full matrix. A full matrix is a matrix with no structure, i.e., an arbitrary matrix.
(a) A = LU
(b) B = UL
(c) C = L2
(d) D = U2
7. Let:
aH1
..
A = [aij ] = . : n × m (6.7.1)
aH
n
(b) Explain:
i. The entries cpq of C in terms of the entries aij of A and bkl of B.
ii. The entries cpq of C in terms of the products of the vectors aH i and bj .
iii. The columns cq of C in terms of the matrix A and the columns bj of B.
iv. The rows dp of C in terms of the rows aH i of A and the matrix B.
v. Verify (i) through (iv) with the matrices
1 2
1 2 −1
A = 1 −1 , B= . (6.7.4)
−1 1 2
1 3
(a) Write explicitly AH in terms of {aij } and {ai }; BH in terms of {bkl } and {bl };
and C in terms of {cpq }, {cp } and {dq }.
(b) Show that
CH = BH AH (6.7.5)
(c) Generalize (6.7.5) to the product of K pairs of matrices
K
Y
C= Ak Bk = AK BK · · · A1 B1 (6.7.6)
k=1
i.e., show
K−1
Y
CH = BH H H H H H
K−k AK−k = B1 A1 · · · BK AK (6.7.7)
k=0
where the dimensions of all the AK and BK are such that all matrix products
are well-defined.
6.7 Problems 369
(d) Simplify (6.7.7) when the matrices AK and BK are all real-valued.
10. Let A and B be n × n matrices. Determine if the following is true or false. Justify
your answers.
(a) (A + B)2 = A2 + 2AB + B2
(b) (A + B)(A − B) = A2 − B2
11. Consider the matrices:
214 51 6 000
A = 3 2 1 , B = 9 2 −3 C = 2 3 4
132 −1 3 7 000
(a) Verify that AC = BC, C 6= 0.
(b) Can you cancel out matrix C? If yes, prove it; if not justify.
12. Let
1
X = 1 and Y = −j 1 − j
1
(a) Compute and find the dimensions of
Z = XY.
(c) Are Z and W symmetric? If yes, justify; if not, say why not.
(d) Are Z and W Hermitean? If yes, justify; if not, say why not.
(e) Compute tr Z and tr W.
13. Let
x
x=
y
a 21 b
A= 1
2
b c
Compute the quadratic form
Q(x, y) = xT Ax.
Express Q(x, y) using the matrix
X = xxT .
370 Vectors and Vector Calculus
14. In Signal Processing, data, possibly complex valued, is usually collected as N vec-
tors xn of dimension M and then grouped in the matrix
X = [x1 · · · xN ].
Two products of this data matrix arise in many Signal Processing applications. The
Grammian of the data matrix
G = XH X,
H = XXH .
B = PAP−1 .
Remark 6.7.1. These are significant properties and show that the determinant and the
trace are invariant to conjugation or similarity transformations.
6.7 Problems 371
−1 5 0 3
(b) If matrix C is invertible, find the inverse C−1 of the matrix C, i.e., find
G = C−1
D = Diag[d1 · · · dN ]
d1 0 · · · · · · 0
..
0 d2 · · · 0 .
. . . . ..
= .. .. . . .. .
.
.. 0 ... . . .
0
0 ··· ··· 0 dN
Determine by induction:
Determine by induction:
23. Determine:
(b) The determinant of A, the (3, 2) minor of A, the (4, 2) cofactor of A, and the
element [A−1 ]24 of the inverse if the inverse exists of the matrix:
1 −1 0 2
−2 1 2 −1
A= 0 2 0 1
1 02 2
24. Given a generic, possibly complex valued, matrix A consider the product
B = AAH
Determine which of the following statements is true, false, or not enough informa-
tion. Provide a short justification for your answer.
(a) |B| = |A|AT
(b) |B| = |A|2
(c) Can not determine |B| in terms of |A| and AT .
(d) Not enough information to determine |B| in terms of |A| and AT .
25. Consider the square matrices A and B. Determine:
|C| = A2 BH
26. Determine the determinant and the inverse of the rotation matrix about the y-axis:
cos θ 0 sin θ 0
0 1 0 0
Ry (θ) =
− sin θ 0 cos θ 0
0 0 0 1
Is this matrix orthogonal? Is this a unitary matrix?
27. Consider the matrix
1 −5 62 2
0 −2 13 −1
A=
0 0 −1 18
0 0 0 1
Is this matrix invertible? If not explain why not; if yes, compute the elements [A−1 ]33
and [A−1 ]34 of the inverse.
28. Consider the matrices A and B as well as their transposes AT and BT :
1 21 −1 1 1 −1
A = 2 1 0 , B = 1 1 1 −1 .
0 −1 1 1 1 −1 1
Determine
374 Vectors and Vector Calculus
A2 − A = 2I.
AT A = AAT = I.
Z = XY
is orthogonal.
32. The discrete Fourier matrix F of dimension n is a square n-dimensional matrix with
wide application. It is introduced in Example 6.4.6 and Equation 6.4.15. Multiply-
ing an n-dimensional vector v by F computes the so called discrete Fourier trans-
form (DFT) of v, which is usually represented by the corresponding capital letter V,
although it is a vector. Hence:
V = Fv.
Below, when you are asked to plot the DFT vector V of a vector v, plot its real part
and its imaginary part as a function of Ω` , i.e., draw two plots, one that plots the
real part of the components of V and the other that plots the imaginary part of the
components of V. Label the horizontal axis of both plots by the values of the fre-
quencies Ω` .
(a) Compute V0 , the DFT of the vector v0 = [1 1 1 1]T , i.e., of the constant signal
with amplitude 1. A constant signal is often referred to as a dc-signal. Plot V0 ,
the DFT of v0 . Interpret your solution and your plot.
(b) Compute Vc , the DFT of the vector vc = [1 cos 2π cos 2π 2 cos 2π
4 4 4
3 ]T and
plot Vc , the DFT of vc . Interpret your solution and your plot.
(c) Compute Vs , the DFT of the vector vs = [0 sin 2π 2π 2π
T
4
sin 4
2 sin 4
3 ] and
plot Vs . Interpret your solution and your plot.
(d) Compute V1 , the DFT of the vector v1 = [1 1 − 1 − 1]T . Plot V1 , the DFT V1
of v1 . Interpret your solution and your plot.
(e) Compute the conjugate, the transpose, and the Hermitian of F4 and compare
each with F4 .
(f) Compute the inverse of F4 . Interpret your result.
Chapter 7
T
HIS
fundamental methods to solve linear algebraic systems of equations. Gauss elim-
ination operates with so called elementary operations to reduce, in a first step,
the forward step, the system of equations to a canonical form (upper triangular). When
coupled with a second step, back substitution, it then solves explicitly for the unknowns or
variables in the linear system of equations.
In Gauss-Jordan elimination, the first step is the forward step of Gauss elimination. The
second step further operates with elementary operations to reduce the linear system to a
diagonal canonical form for which the solution of the linear system of algebraic equations
is trivial.
Gauss elimination is very useful. With matrices in triangular form we know that the de-
terminant is the product of the diagonal elements. So, by reducing the original matrix
to a triangular form, the determinant of this triangular matrix is computed simply by
multiplying its scalar diagonal entries. What is interesting is that the determinant of the
original matrix relates easily to the determinant of its reduced triangular form.
Besides the determinant, Gauss and Gauss-Jordan elimination provide a speedy method
to compute the inverse of a matrix and an efficient method to solve a linear system of al-
gebraic equations. But, they have broader applications and are very useful in computing
the rank of a matrix, determining if a set of vectors are linearly dependent or independent,
finding a basis for certain vector subspaces associated with a matrix, among many other
applications. We do not know as yet what some of these concepts are or why they are
important; we will learn about them in subsequent Chapters. But we should have an ap-
preciation for Gauss and Gauss-Jordan elimination–they are simple, effective, and useful.
In the sequel, often, we refer only to Gauss elimination, but similar comments may apply
to Gauss-Jordan, even if not made explicitly.
7.1 Introduction 377
7.1 Introduction
We start by motivating Gauss elimination in the context of solving a set of simultaneous
linear algebraic equations. Consider the equations:
4x1 + 6x2 + 9x3 = 6
6x1 − 2x3 = 20 (7.1.1)
5x1 − 8x2 + x3 = 10.
These are three equations; there are also three unknowns, the variables x1 , x2 , and x3 .
These unknowns are fixed parameters, i.e., they are not functions of time. Hence, there
are no delays or advances or derivatives, so, the equations in (7.1.1) are algebraic. In this
course, unless otherwise stated, we assume that the unknowns x1 , x2 , and x3 are com-
bined linearly by coefficients drawn from the reals. For example, in the first equation
in (7.1.1) the variable x1 is multiplied by the coefficient 4, the second variable x2 by the co-
efficient 6, and, finally, the third variable x3 is multiplied by the coefficient 9. Because only
powers of degree one of the variables appear, the equations in (7.1.1) are linear. There are
also known quantities, the 6, 20, and 10 on the right-hand-side (RHS) of the equations–
these are referred to as independent terms. The independent terms are also drawn from
the reals. A set of such simultaneous equations like (7.1.1) is referred to as a system of lin-
ear algebraic equations, linear systems for short, when their algebraic nature is implicit from
the context.
These linear systems of algebraic equations arise in many applications. In fact, one can
speculate that a good fraction of engineering and science problems involve in some form
or another solving a linear system of algebraic equations. This system may not be ap-
parent in the original formulation, but after appropriate manipulations, the solution may
reduce to solving such a linear system.
In complex engineering applications the number of equations and the number of un-
knowns may be very large. So, the field of scientific computing develops expedite meth-
ods to solve large systems of linear equations. By large, we may mean hundreds of thou-
sands or even more. Of course, in this course, we will focus on concepts, and we will
usually be concerned with much smaller systems, systems with only a few equations
and a few unknowns. But do not be fooled by the apparent simplicity of these systems.
The concepts and methods we learn while studying these simple small systems are very
powerful, and with the help of a computer we can successfully address the much more
realistic Big Data problems of today.
In (7.1.1), we observe that the number of equations and the number of unknowns (or vari-
ables) is the same. This is not necessarily always the case, and we will consider problems
where the number of equations is larger than the number of unknowns, as well as prob-
lems where the number of equations is smaller than the number of unknowns.
378 Gauss and Gauss-Jordan Elimination
The canonical form of a linear system of algebraic equations has the linear combination of
the unknowns on the left-hand-side (LHS) of the equations and the known independent
terms on the RHS. On the LHS, the unknowns are displayed in all the equations of the
system in the same order, as we move from left to right in each equation. For example,
if in (7.1.1), we order the unknowns as x1 , x2 , and x3 , each equation will display first the
term in x1 , then the term in x2 , and, finally, the term in x3 . If in an equation a term is miss-
ing, like in the second equation the term in x2 is missing, we interpret its coefficient as
being zero and simply move to the next term. With longhand writing of these equations
(possible only when you have a few equations as in this case), it is mnemonic to display
the equations so their structure is preserved, which means we align vertically the terms
by the unknowns. So, if a term is missing, we leave enough blank space so the previous
and next terms are still aligned.
A solution to the system of equations is a set of numerical values such that when we sub-
stitute the unknowns by these values we obtain an identity, i.e., the LHS of each equation
results in exactly the corresponding independent term. There are basically three issues
regarding solving a system of linear equations: 1) is there a solution or solutions to the
system of equations–this is the existence question; 2) if existence is answered affirmatively,
is the solution unique–this is the unicity question; and 3) if there are one or several solu-
tions, can we find them–this is the how to solve question. We will see that Gauss elimina-
tion helps with addressing all three questions.
We now resort to Chapter 6 and rewrite with vectors and matrices a generic system of
linear equations. We illustrate first with the simple example in (7.1.1). The coefficients of
the system are grouped into a matrix A, the unknowns in a vector x, and the independent
terms in a vector b:
4 6 9
A = 6 0 −2 (7.1.2)
5 −8 1
x1
x = x2
x3
6
b = 20.
10
7.1 Introduction 379
From the matrix format (7.1.2) for (7.1.1), it is clear that, if the matrix A is square, i.e., the
number of equations mA equals the number of unknowns mx , and if A is invertible, the
solution to (7.1.1) is conceptually simple and given by:
x = A−1 b. (7.1.4)
The solution (7.1.4) is deceptively simple, because determining if the matrix A is invert-
ible using for example Result 6.6.2 in Chapter 6 requires checking if A is non singular,
i.e., detA 6= 0, a nontrivial task using the direct methods of Chapter 6. After knowing
that A is invertible, solving as in (7.1.4) still requires inverting the matrix A. Finding the
determinant and inverting a (square) matrix of dimension n is in general computationally
a problem of order n3 , i.e., requires on the order of n3 floating point operations. To have a
feeling for what this entails, consider the example when the matrix A is mA = mA = 1000,
i.e., we are interested in solving a system of one thousand equations in one thousand un-
knowns. Inverting a 1000 × 1000 matrix requires on the order of 109 floating point opera-
tions (FLOPS). With a 1 GHertz clock computer that can solve in parallel per clock cycle
one addition and one multiplication, solving such a system will require on the order of
one second. Granted that a computer may be much faster (not in clock cycles but because
of multicores) and so, instead of one second, the faster computer may require only a frac-
tion of a second to invert the matrix and solve the linear system (7.1.3) using (7.1.4). But
applications are commonly also much larger than 103 equations, and so solving a very
large general linear system like in (7.1.4) is expensive. Fortunately, in very large applica-
tions, the equations are sparse, i.e., most entries of A are zero. By exploiting this sparse
structure it is then possible to expedite very considerably the solution of the linear system,
and it is practical to consider systems with many thousands, if not millions, of equations
and unknowns.
Here, we consider Gauss elimination, which is a method that is systematic and that can
be easily programmed in a computer. We motivate the method by first working directly
with solving a linear system of equations and then reformulating this example in terms
of Gauss elimination. This we do in the next Section.
380 Gauss and Gauss-Jordan Elimination
In Subsection 7.2.3, we show how these row operations on matrices can be interpreted as
premultiplication by certain simple matrices, the elementary matrices.
This section illustrates by working with examples the technique and concepts in Gauss
and Gauss-Jordan elimination. The method is simple and systematic. In subsequent Sec-
tions, we consider the general methodology of Gauss and Gauss-Jordan elimination and
also discuss shortcuts often taken in practice.
x1 + x2 + x3 = 1
2x1 + x2 − x3 = −1 (7.2.1)
x1 + 2x2 − 2x3 = 2
This system is in canonical form since the independent known terms are all on the RHS,
the unknowns are all on the LHS and, in each equation, the unknowns are written in the
same order from left to right.
A solution are three known values, say, α1 , α2 , and α3 such that when we replace x1 = α1 ,
x2 = α2 , and x3 = α3 in (7.2.1) we obtain an identity, i.e., the LHS of each equation is
identical to the RHS of the equation:
α1 + α2 + α3 = 1
2α1 + α2 − α3 = −1
α1 + 2α2 − 2α3 = 2
Linear systems: Elementary operations. To solve (7.2.1), we operate with the equations in one
of three ways: 1) multiply one equation by a nonzero scalar, for example, multiply the sec-
ond equation by 21 ; 2) interchange equations, for example, the first equation with the third
7.2 Gauss and Gauss-Jordan Elimination: Examples 381
equation; 3) replace one equation by the equation obtained by multiplying another equa-
tion by a nonzero scalar and adding the resulting equation to the original equation, for
example, replace the third equation by the result of multiplying the second equation by 12
and subtracting it from the third equation.
We can easily verify that the solution of the original system of equations is not modified
by substituting the original system by the system of equations that results by application
of one or more of these operations. These operations are called elementary operations. By
judiciously applying the elementary operations, we can reduce the system to another sys-
tem in a form that is much easier to solve. Each of these operations is called elimination,
and the systematic process that accomplishes it is called Gaussian elimination.
The goal is to reduce the system of equations to a triangle form. We explain this with an
example and break the solution in successive steps.
1. First pivot. We start by choosing the equation in (7.2.1) for which the coefficient of
the first variable x1 is the largest. In the example we are working on, this is the
second equation. The coefficient of the first variable in the second equation is two,
larger than the coefficients of x1 in the first and third equations. This largest coeffi-
cient becomes the first pivot we choose.
We next interchange the first and second equation, so that the equation with the first
pivot becomes now the first equation. This equation with the first pivot is the first
pivot equation. This interchange of equations is an elementary operation and does
not change the solution of the system of three equations. We get:
2x1 + x2 − x3 = −1
x1 + x2 + x3 = 1 (7.2.2)
x1 + 2x2 − 2x3 = 2
Finally, we divide the first equation by the pivot. Again, this is an elementary oper-
ation and does not change the solution.
This normalizes the coefficient of the first variable, variable x1 , in the now first equa-
tion to be one. The system becomes:
1
..
2
. x1 + 21 x2 − 12 x3 = − 12
.. (7.2.3)
. x1 + x2 + x3 = 1
..
. x1 + 2x2 − 2x3 = 2
382 Gauss and Gauss-Jordan Elimination
The number in parentheses on the most left position of the first equation is to remind
us that the first equation is the result of normalizing this equation by the pivot,
i.e., of multiplying the equation by the inverse 12 of the pivot, which is 2, the first
coefficient of the first equation in system (7.2.2).
2. Elimination. We now use the first equation to zero out the coefficients of the first
unknown x1 in the equations below the first equation, i.e., in the second and third
equations. Because the coefficients of the first unknown x1 are one in both equations,
one and two, we zero out the coefficient in x1 in the second equation by subtracting
the first equation from the second equation and writing the result as the second
equation. The same is done with the third equation. We subtract the first equation
from the third equation and write the result in the third equation. Subtracting two
equations is an elementary operation, so the net result is that the resulting system
still has the same solution as the original system. We obtain the system below:
x1 + 21 x2 − 12 x3 = − 12
1
x + 32 x3 = 32
2 2
(7.2.4)
3
x − 32 x3 = 52
2 2
At the end of this step, the coefficient of x1 in the first equation is one, and the
coefficients of x1 in all equations below the first are zero.
3. Recursion. Note that the second and third equations are now a subsystem of two
equations in two unknowns. The unknown x1 has been eliminated from the second
and third equations. We now ignore the first equation and work with the subsystem
formed by the two equations, the second and third, in two unknowns and repeat the
previous step:
(a) Choose the second pivot by inspecting the coefficients of the second variable x2
in both the second and third equations and choosing the largest. Clearly this is
the coefficient 32 in the third equation. This becomes pivot number two and the
equation the second pivot equation.
(b) Interchange the second and third equations, so that the equation with the sec-
ond pivot is now the second equation.
(c) Normalize the second equation by the second pivot, so that the coefficient of
the second variable in the second equation is 1.
(d) Zero out the coefficients of the second variable in all equations below the sec-
ond (there is only one equation, which is the third equation) by multiplying
the second equation by the coefficient of the second variable x2 of the third
equation and subtracting the resulting equation from the third. The resulting
equation replaces the third equation.
7.2 Gauss and Gauss-Jordan Elimination: Examples 383
We perform these steps to obtain successively the systems below. First we inter-
change the order of the second and third equations:
x1 + 12 x2 − 12 x3 = − 12
3
x − 32 x3 = 52
2 2
(7.2.5)
1
x + 32 x3 = 32
2 2
Now zero out the coefficient of the second unknown x2 in the third equation:
..
. x1 + 12 x2 − 12 x3 = − 12
1 .. (7.2.7)
x2 − x3 = 35
2
.
..
. 2x3 = 23
4. We now pick the third pivot. Since we have only one equation left, the third, we
look in this equation for the first unknown whose coefficient is nonzero. Clearly,
this is unknown x3 , whose coefficient is 2. This is the third pivot, and we normalize
this pivot to one, by multiplying the third equation by the inverse of the pivot. The
third equation is the third pivot equation.
..
. x1 + 12 x2 − 12 x3 = − 12
.. (7.2.8)
. x2 − x3 = 53
1 ..
x3 = 31
2
.
5. Leading coefficients, leading equations, and triangular form. At the end of step 2, the coef-
ficients of x1 below the first equation have been zeroed out. At the end of step 3, the
coefficients of x2 below the second equation have been zeroed out (the coefficient
of x2 in the third equation). The resulting system at the end of step 4 is in (7.2.8)
and has a very distinct form: it is in triangular form, actually upper triangular, and
the leading coefficients in each equation, i.e., the first nonzero coefficient in each
equation as we move from left to right, are all normalized to one. The unknowns
corresponding to the leading coefficient in each equation are referred to as leading
variables or leading unknowns.
384 Gauss and Gauss-Jordan Elimination
We take note of the three pivots found in steps 1, 3a, and this step 4: 2, 32 , and 2,
respectively.
There are now two ways we can proceed. The first is by back substitution, which
is explained next. The second method is Gauss-Jordan elimination that continues
reducing the system to a canonical form that is simple to solve. We consider this
after back substitution.
6. Back substitution. With the system in triangular form it is easy now to find the solu-
tion. We work backwards from the last equation, the third equation. The method is
called back substitution.
We start from the bottom equation, the third equation, whose LHS has only one
variable x3 with coefficient one. So, this third equation solves for the variable x3 . In
the second equation, we move the term in x3 to the RHS. Finally, in the first equation,
we move both the term in x3 and the term in x2 to the RHS. We obtain successively:
x1 = 1 − x2 − x3
x2 =3 − 3x3 (7.2.9)
1
x3 = 3
We now can back substitute the value of x3 from the last equation in the second and
first equations, and the value of x2 determined from the second equation in the first
equation. This determines the values of the three unknowns and solves the system
of three linear equations in (7.2.1). The solution is:
x1 = − 43
x2 = 2 (7.2.10)
x3 = 13
The numbers in parentheses on the left of the third equation represent the scalars by
which we multiply the third equation before subtracting it from the second equation
and then the first equation.
The result of this step is to zero out the coefficients of the unknown x3 in both the
second and first equations. Note that (7.2.12) is the result of elementary operations
applied to the original system.
9. We now operate with the second equation to zero out the coefficient of the un-
known x2 in the first equation. We achieve this by replacing the first equation with
the result of multiplying the second equation by 12 and subtracting it from the first
equation. We get:
..
. x1 = − 34
1 .. (7.2.13)
2
. x2 = 2
..
. x3 = 13
Comparing this with (7.2.10), we confirm that we reached the same solution for the
system of equations (7.2.1). Again this is obtained through elementary operations.
Diagonal system. The system in (7.2.13) has a very special structure, it has a diago-
nal structure: the LHS of the first equation has only the first unknown x1 , the LHS
of the second equation has only the second unknown x2 , and the LHS of the third
equation has only the third unknown x3 . Further, the coefficients of the unknowns
on the LHS are all normalized to 1.
We note that in both the Gauss elimination step, the forward step, and in the back-
ward step of Gauss-Jordan elimination, we have used only elementary operations
and the same (normalized) pivot coefficients.
386 Gauss and Gauss-Jordan Elimination
4 6 9 x1 6
6 0 −2 x2 = 20 (7.2.15)
5 −8 1 x3 10
| {z } | {z } | {z }
A x b
We introduce needed notation. We call augmented matrix the matrix A concatenated with
the column vector of independent terms b:
A = [A | b] (7.2.16)
..
4 6 9 . 6
= 6 0 −2 .. (7.2.17)
. 20
..
5 −8 1 . 10
We apply to the augmented matrix A the elimination steps in Gauss elimination to reduce
it to an upper triangular form. This follows closely the same steps we illustrated when
solving the linear system (7.2.1). The matrix with upper triangular form obtained at the
end of Gauss elimination is the row echelon form of the matrix we start with.
Once we obtain the row echelon form of A, we can proceed as in the back substitution
step 6, or with the Gauss-Jordan elimination in 6, see Subsection 7.2.1. If we continue with
the Gauss-Jordan elimination, the resulting matrix is the reduced row echelon form.
The elementary operations introduced in Section 7.2.1 in the context of linear system of
algebraic equations are now reinterpreted as operating on the rows (or columns) of a ma-
trix.
Matrix operations: Elementary operations. We can operate with the rows (or columns) of a
matrix in one of three ways: 1) multiply one row by a nonzero scalar, for example, mul-
tiply the second row of a matrix by 12 ; 2) interchange two rows, for example, the first row
with the third row of the matrix; 3) replace one row by the row obtained by multiplying
another row by a nonzero scalar and adding the resulting row to the original row, for
example, replace the third row of a matrix by the result of multiplying the second row by
1
2
and subtracting it from the third row.
7.2 Gauss and Gauss-Jordan Elimination: Examples 387
These elementary operations were stated in terms of rows. We can restate them in terms
of columns, for example, the second elementary operation would be interchange two
columns. In these notes we will work with row elementary operations, unless otherwise
stated.
These matrix elementary operations on the rows or columns of a matrix are herein pre-
sented with the goal of solving a linear system of equations. It turns out that they pre-
serve important properties of a matrix as we will see in due time. Used in the context of
Gauss elimination (reducing the matrix to an upper triangular form) and in the context
of Gauss-Jordan elimination (reducing the matrix to a diagonal form), they provide for
square matrices an alternative way to compute the determinant of the matrix or, in case
the matrix is invertible, inverting the matrix. We will revisit these topics and explain them
in more detail below.
We first identify the largest entry in the first column of A. This is the second entry,
a 6, and it becomes the first pivot. We interchange the first and second rows, so that
the pivot is the first entry of the first column. We get:
.
6 0 −2 .. 20
4 6 9 ... 6 .
(7.2.19)
..
5 −8 1 . 10
We now normalize the entries of the first row so that the first entry is one. We
multiply the row by 61 , the inverse of the pivot:
1 .. 10
1
6 1 0 − 3
. 3
4 6 9 ... 6 . (7.2.20)
.
5 −8 1 .. 10
The factor 16 in parentheses on the left of the matrix indicates that we normalized
the first row in (7.2.19) to obtain the first row in (7.2.20).
388 Gauss and Gauss-Jordan Elimination
2. Elimination. Next we use the first row to zero the remaining entries of the first col-
umn. To zero the first entry of the second row, we multiply the first row by 4,
subtract it from the second row, and replace the second row by the result. We also
zero the first entry of the third row by multiplying the first row by 5, subtracting the
result from the third row and replacing this row by the result:
.
(5)(4) 1 0 − 31 .. 10
3
0 6 31 ... − 22 .
(7.2.21)
3 3
.
0 −8 83 .. − 20
3
3. Recursion. Now, we repeat step 1 but with the submatrix formed by rows two and
three, and by columns two, three, and four. First, we identify the second pivot
by scanning column two, starting from row two and look for the largest entry (in
absolute value). This is entry (3, 2) of the third row. The pivot is now −8. We
exchange rows two and three to obtain:
1 ..
10
1 0 −3 . 3
0 −8 8 ... − 20 .
(7.2.22)
3 3
.
.. − 22
0 6 31 3 3
4. We normalize row two by multiplying the row by the inverse of the second pivot
− 18 :
.
1 0 − 13 .. 10 3
1 0 1 − 1 ..
. 5 . (7.2.23)
−8 3 6
.
.. − 22
0 6 31
3 3
5. We zero out all entries in the second column, below row two. This is only the entry
(3, 2) in the third row. To achieve this, we multiply row two by 6, subtract it from
row 3, and replace the result in row 3:
.
1 0 − 13 .. 10 3
0 1 − 1 ...
(6) 5 . (7.2.24)
3 6
.
.. − 37
0 0 37
3 3
The resulting matrix is in upper triangle form. This is the end of Gauss elimination,
and the matrix obtained is the row echelon form of A.
We could proceed with the backward step or with the Gauss-Jordan method. We
proceed with Gauss-Jordan.
7. Gauss-Jordan elimination. We use the third row to zero out the entry (2, 3) in the
second row. We multiply the third row by − 31 , subtract it from row two, and replace
it in row two:
.
1 0 − 13 .. 10
3
0 1 0 ... 1 .
(7.2.26)
2
.
(− 13 ) 0 0 1 .. −1
8. Reduced row echelon form. We repeat the previous step but now to zero out the en-
try (1, 3). We multiply again row three by − 13 , subtract it from row one, and replace
the result in row one:
..
100. 3
0 1 0 ... 1 .
(7.2.27)
2
.
(− 13 ) 0 0 1 .. −1
The resulting matrix on the left of the vertical dashed line (the original matrix A) is
now reduced to a matrix with a very special form. In this example, it is the three
dimensional identity matrix. This is the reduced row echelon form of the original aug-
mented matrix A. We come back to it in Subsection 7.3.
37
We also take note of the three pivots: 6, −8, and 3
.
1 .. 1 ..
10 10
1 0 − . 1 0 − .
100 3 3 3 3
0 0 1 0 6 31 ... − 22 = 0 −8 8 ... − 20 .
(7.2.28)
3 3 3 3
010 . .
0 −8 83 .. − 20
3
0 6 313
.. − 22
3
1 .. 1 ..
10 10
1 0 − . 1 0 − .
1 00 3 3 3 3
0 − 1 0 0 −8 8 ... − 20 = 0 1 − 1 ...
8
5 (7.2.29)
3 3 3 6
0 01 .
.. − 22 .
.. − 22
0 6 31 3 3
0 6 31
3 3
Finally, the third elementary operation is to multiply a row by a scalar, add it to another
row, and replace the last row by the result. An example is in step 5 in Section 7.2.2 when
we zero out the element in entry (3, 2) by multiplying row two by the scalar 6 and sub-
tracting the result from row three and replacing row three by the result. This is interpreted
as pre-multiplying by an elementary matrix as illustrated below:
1 .. 1 ..
10 10
1 0 − . 1 0 − .
1 00 3 3 3 3
0 1 0 0 1 − 1 ... 1 ..
5 =
0 1 −3 .
5 . (7.2.30)
3 6 6
0 −6 1 . .
0 6 31 .. − 22
3 3
0 0 37 .. − 37
3 3
1 ..
10
1 0 − .
1 00 1 00 100 3 3
0 1 0 0 − 1 0 0 0 1 0 6 31 ... − 22 =
8 3 3
0 −6 1 0 01 010 8 .. 20
| {z } 0 −8 3 . − 3
EG
.. 10
1 0 − 31 . 3
= 0 ..
5 . (7.2.31)
1 − 13 . 6
37 .. 37
0 0 3
.−3
This is the result of Gauss elimination. The matrix EG is the product of three elementary
matrices and reduced the matrix in (7.2.21) to the matrix in (7.2.31). We now continue
7.3 Gauss, Gauss-Jordan Elimination: General Case 391
.
1 0 − 31 .. 10
3
1 ..
5 .
0
1 −3 . 6
37 .. 37
0 0 3 .−3
We first normalize the diagonal entry of the third row. Then we need to zero out the entry
(2, 3) of the resulting matrix, and, finally, zero out the entry (3, 1) of the resulting matrix.
We get
.. 10
1
0 0 1 0 − 31 . 3
10 3
1 00 1
0 1 0 0 1 31
0 1 0 ..
5 =
0 1 − 13 . 6
3
001 0 01 0 0 37 37 .. 37
| {z } 0 0 3
.−3
EGJ
..
100. 3
= 0 1 0 ... 1
. (7.2.32)
2
.
0 0 1 .. −1
This matrix is the reduced row echelon form of the matrix in (7.2.21). This matrix is
the identity matrix on the left of the pointwise line. The matrix EGJ reduced the ma-
trix in (7.2.31), the result of Gauss elimination on matrix (7.2.21), to the matrix in (7.2.32).
There are three types of elementary matrices. We consider explicitly row operations. The
elementary matrices for column operations are essentially the transpose of the row ele-
mentary matrices. We now list the three elementary operations O1 , O2 , and O3 .
O1 : Product of a row of a matrix by a nonzero scalar. Multiplying row i of a matrix A by
a nonzero scalar αi is obtained by pre-multiplying A by a matrix that is the identity
matrix IM where the ith-row eTi of IM is replaced by αi eTi . The elementary matrix is:
column i (7.3.1)
..
. (7.3.2)
1 0 ··· ··· ··· 0
0 1 0 ··· ··· 0
. .. . . .. .. ..
. . . . .
. .
E= (7.3.3)
0 ··· · · · αi · · · 0 · · · row i
. .. .. .. . . ..
.. . . . . .
0 ··· ··· ··· ··· 1
O3 : Replacing a row by its linear combination with another row. Replace row j by the
row obtained by multiplying row i by a nonzero scalar αij and adding the resulting
row to row j. The elementary matrix that accomplishes this is:
column i
..
.
1 0 ··· ··· ··· ··· 0
0 1 0 ··· ··· ··· 0
.. .. . . .. .. .. ..
. . . . . . .
. .. .. . . .. .. ..
..
E= . . . . . . (7.3.5)
· · · αij · · · 1 ··· · · · row j
0 0
. .. .. .. .. .. ..
.. . . . . . .
0 ··· ··· ··· ··· ··· 1
column i
..
.
0 0 ··· ··· ··· ··· 0
0 0 0 ··· ··· ··· 0
.. .. . . .. .. .. ..
. . . . . . .
. .. .. . . .. .. ..
..
Eij = . . . . . . (7.3.6)
··· 1 ··· 0 ··· · · · row j
0 0
. .. .. .. .. .. ..
.. . . . . . .
0 ··· ··· ··· ··· ··· 0
column i (7.3.8)
..
. (7.3.9)
1 0 ··· ··· ··· 0
0 1 0 ··· ··· 0
. .. . . .. .. ..
. . . . .
. .
EO1 = (7.3.10)
0 ··· · · · αi · · · 0 row i
. .. .. .. . . ..
.. . . . . .
0 ··· ··· ··· ··· 1
(7.3.11)
=⇒ (7.3.12)
(7.3.13)
column i (7.3.14)
..
. (7.3.15)
1 0 ··· ··· ··· 0
0 1 0 ··· ··· 0
. .. . . .. .. ..
.. . . . . .
E−1 = (7.3.16)
O1
0 ··· · · · α1i · · · 0 row i
. .. .. .. . . ..
.. . . . . .
0 ··· ··· ··· ··· 1
2. Inverse of O2 interchange of two rows of a matrix. The inverse of O2 is the same opera-
tion, i.e., the inverse of the interchange of rows i and j is the interchange of rows i
and j. The matrix representing this elementary operation is exactly the same as
in (7.3.4):
E−1
O2 = EO2 (7.3.17)
3. Inverse of O3 replacing a row by the linear combination with another row. The inverse
of O3 is another elementary operation O3 where, instead of using the scalar αij as
in (7.3.5), we use −αij , the negative of the scalar. The matrix representing this in-
verse elementary operation is the same as in (7.3.5) where αij is replaced by −αij .
7.3 Gauss, Gauss-Jordan Elimination: General Case 395
Using (7.3.7) as the representation for this elemenatry matrix, the inverse is, when
i 6= j:
E−1
O3 = I − αij Eij , (7.3.18)
where we recall Eij is a zero matrix, except entry (i, j) that is 1. The matrix in (7.3.18)
is the inverse of the matrix in (7.3.7) as can easily be verified since E2ij = 0, unless
i = j.
The leading coefficient of a row is the first nonzero entry of the row, as we scan the row
from left to right.
Definition 7.3.1 (Row echelon form). A M × N matrix A is in row echelon form iff:
1. All nonzero rows, i.e., rows where at least one element is different of zero, precede any
zero row, i.e., rows where all entries are zero.
3. The leading coefficient of a row is to the right of the leading coefficient of the row above
it.
Note that in a matrix in row echelon form all the zero rows are at the bottom below the
nonzero rows, and the entries in a column below an entry that is a leading coefficient are
zero.
While a matrix in row echelon form has zeros below a leading coefficient, a matrix in re-
duced row echelon form has zeros below and above a leading coefficient. Also, if a column
396 Gauss and Gauss-Jordan Elimination
has no leading coefficients, then its entries may be zero or nonzero. They can be nonzero
only if they are on a nonzero row and then to the right of the leading coefficient in this
row. In other words, a column with no leading coefficients may be a column of zeros, i.e.,
a zero column, see the second column of the matrix B1 in Example 7.3.2.
Remark 7.3.1 (Column echelon form). We have introduced in Definition 7.3.1 the row
echelon form. If we operate on columns, we obtain the column echelon form. A matrix is in
column echelon form, if its transpose is in row echelon form.
0000 00
Note that A1 and A2 have a zero row at the bottom. The leading coefficient in the nonzero
rows is 1. The leading coefficient in a nonzero row is to the right of the leading coefficients
of the rows above. For example, in A2 the leading coefficient in row 3 is the entry (3, 5),
which is to the right of the leading coefficients (1, 1) and (2, 3) in rows one and two.
0000 00
This matrix A3 modifies matrix A2 because entry (1, 2) is zero. The second column of the
matrix is zero. Still all leading rows are above the zero rows and the leading entry of a
nonzero row is to the right of the leading entry of the row above.
This matrix A4 is not in row echelon form since the leading coefficient of row 3 is to the
left of the leading coefficient of the row above it (and actually, also of the leading coef-
ficient of the first row). In this case, a row echelon form could be obtained by circularly
shifting the rows, i.e., by interchanging the rows such that row 3 becomes row 1, row 1
becomes row 2, and, finally, row 2 becomes row 3.
This matrix fails being in row echelon form since a row of zeros precedes a nonzero row.
This is simple to fix by interchanging rows three and four.
In the above examples, we did normalize the leading coefficients to 1 and used when
necessary the elementary operation of row interchange.
000000
Both, matrices B1 and B2 are in row echelon form and all the entries of a column above a
leading coefficient of a row are zero.
398 Gauss and Gauss-Jordan Elimination
We discuss here the general structure of the row echelon form of a matrix. We assume
that, if there is a column of zeros in the original matrix, it is not the first column.
From the definition and the examples worked out, including A1 through A3 , the general
structure of the row echelon form of a matrix is given by:
1 ∗ ··· ··· ∗
... ..
∗ .
.. r rows
.
0 ..
.
A= (7.3.19)
∗ ··· ∗
0 n − r rows
First note that in the row echelon form, if there are rows of zeros they are at the bottom.
These are indicated by the n − r bottom rows in (7.3.19) that are zero.
On the top r rows, we distinguish a trapezoidal structure where ∗ indicates don’t care en-
tries (they can take any value as long as in each leading row the leading entry is nonzero)
to the left of which there is a triangular shaped zero block. The diagonal bounds and
belongs to the trapezoidal block. It starts from the entry (1, 1). Unless A = 0, entry (1, 1)
is the first pivot. This pivot is assumed to have been normalized to 1. This diagonal is a
45o diagonal, extending to the entry (r, r).
The top r rows are the leading rows; each of these rows has a leading entry that is a 1
(assuming we normalized all the pivots to one), and each leading entry is to the right of
the leading entry of the row above. So, the entries of the 45o diagonal may be zero, except
the first entry (1, 1) that has to be nonzero. The structure of the row echelon form of A
can then be simply represented as:
U
A =⇒ · · · · · . (7.3.20)
0
Although the block U is trapezoidal with r rows, we still refer to it as upper triangular.
Its r rows are the r leading rows of the row echelon form. We will refer to U as an upper
triangular block.
7.3 Gauss, Gauss-Jordan Elimination: General Case 399
The structure follows from the structure of the row echelon form in (7.3.19) or (7.3.20).
The block matrix U in (7.3.20) is upper triangular, but its diagonal, except for the first
entry, may have zeros (if normalized, the leading entry of the first row, i.e., (1, 1) is a 1),
and each leading entry in a row of U is to the right of the leading entry of the row above
it. After Gauss-Jordan, all entries above a leading entry of U are zeroed out. The block U
is reduced to a block that has a “jagged” diagonal. We represent this block by D.
We discuss further the structure of a reduced row echelon form with respect to the fol-
lowing example.
12010
A =⇒ U =⇒ D = 0 0 1 1 0 .
00001
The D block in this case is the 3 × 5 matrix, since there are no zero rows. Note that D is
not a diagonal matrix. Not all elements in the upper block of D are zero as illustrated in
the example above. There may be entries to the right of the leading element of a leading
row that are not zero, see rows one (elements (1, 2) and (1, 4)) and two (entry (2, 4)). If a
diagonal entry is zero, then all subsequent diagonal entries are zero (diagonal entry (2, 2)
is zero, so diagonal entry (3, 3) is also zero), and then the leading elements in the leading
rows are to the right of the diagonal element of that row (for example, in row two, the
leading element is (2, 3), to the right of the diagonal entry (2, 2), and, similarly, in row
three, the leading element is (3, 5), to the right of the diagonal entry (3, 3)). With this in
mind, the general structure of a reduced row echelon form is:
· · ··· ··· ·
... ..
.
0 ...
..
.
A= (7.3.21)
· · · · ·
0
D
= (7.3.22)
0
universally accepted notation; as we say, it serves only the purpose of describing the two
Algorithms 1 and 2. Let the matrix A be M × N .
Let Aij be a subblock of A with all entries Ak` with i ≤ k ≤ M and j ≤ ` ≤ N . For
example A11 = A and AM N = AM N .
We consider now these shortcuts to Gauss and Gauss-Jordan elimination. They avoid two
of the steps introduced in Gauss elimination: exchanging rows and normalizing pivots to
7.4 Gauss Elimination: Applications 401
We indicated that when scanning the entries of a column to search for the pivot, i.e., the
largest in absolute value entry of the column, we may need to interchange the rows of
the matrix, i.e., perform elementary operation O2 . Say, if scanning the first column, the
largest in absolute value entry is (3, 1), we interchange rows 1 and 3 so that the pivot is
this entry (3, 1).
By Subsection 7.3.1.1, when exchanging rows, the matrix representing this elementary op-
eration is not triangular, and so the product of elementary matrices involved in Gauss or
Gauss-Jordan elimination is not a triangular matrix. For reasons that will become clearer
later, we may want to preserve the triangular nature of the product of these elementary
matrices, and so Gauss elimination is sometimes simplified to not include this step.
The second shortcut has to do with the normalization of a leading row (i.e., a row that
has a leading coefficient or pivot) by the inverse of the pivot. Often times, this leads to
fractions and complicates unnecessarily the arithmetic of the successive steps of Gauss
elimination. To avoid this, the normalization is avoided, and we keep the pivots as they
are, or at least do not normalize them to 1. The resulting row echelon form does not have
leading coefficients of 1.
When this is the case, when performing Gauss-Jordan, we do need then to normalize the
leading coefficients of the row echelon form to 1 before proceeding.
The net result is that the row echelon form of a matrix may not be unique–the final form
obtained may depend on the exact steps taken and the order in which they are taken. The
reduced row echelon form, except for normalization by the pivots, is always unique.
We consider several examples that illustrate the different possibilities that may arise when
solving a linear system of algebraic linear equations: 1) a solution exists and is unique;
2) a solution exists but is not unique; and 3) a solution does not exist.
Given a linear system, we illustrate with examples how Gauss elimination provides a
simple way to determine which of these three cases actually occurs. When the solution
exists, cases 1) and 2), then either back substitution or Gauss-Jordan will then lead to the
unique solution, case 1), or to the family of solutions, case 2). When the solution does not
exist, case 3), then we stop the computation at the end of Gauss elimination, and there is
no back substitution, nor Gauss-Jordan step.
x1 − 2x2 = −2
5x1 − 2x2 = 4
(7.4.1)
3x1 + 2x2 = 8
2x1 − 4x2 = −4.
Equation (7.4.2) identifies the system matrix A, the vector of unknowns x, and the vector
of independent terms b.
A = [A | b] (7.4.3)
7.4 Gauss Elimination: Applications 403
..
1 −2 . −2
..
5 −2 . 4
= .. . (7.4.4)
3 2
. 8
..
2 −4 . −4
We now apply Gauss elimination to reduce A to triangular form. We will apply one of
the shortcuts described in Subsection 7.3.4, avoiding the elementary operation O2 , i.e., we
will not interchange rows.
Gauss elimination proceeds from left to right and top to bottom. Under each arrow, we
indicate the elementary matrix that multiplies (on the left) the matrix to the left of the
arrow to obtain the matrix on the right of the arrow.
We applied Gauss elimination in a simplified way. In the first column, entry (2, 1) is the
largest in absolute value, so we should have exchanged rows one and two, but, as dis-
cussed in Section 7.3.4, we can choose not to apply this step; here, we choose not to apply
it. This means the first pivot is the entry (1, 1), which is a one.
The first elementary matrix is a O3 elementary matrix and subtracts five times the first
row to the second row. The second elementary matrix is also a O3 elementary matrix and
subtracts three times the first row to the third row. The third elementary matrix is also
a O3 elementary matrix and subtracts two times the first row to the fourth row. These
404 Gauss and Gauss-Jordan Elimination
three operations have zeroed out all entries in the first column below the leading coeffi-
cient of the first equation. It is interesting to note that after the third elementary operation
the fourth row is identically zero.
We now repeat these steps but with the subblock A2,2 . Next, pick as pivot entry (2, 2).
The pivot is 8. The fourth elementary matrix is a O1 elementary matrix and normalizes
the second row by one over this pivot. We now use this second row to zero out all entries
in the second column below the leading entry of the second row. Since entry (4, 2) is al-
ready zero, the only entry left is (3, 2).
The fifth elementary matrix is again a O3 elementary matrix and subtracts eight times the
second row to the third row.
We now should repeat the previous steps with the subblock A3,3 . Scanning the third row,
there is no nonzero entry. Since the fourth row is also a zero row, Gauss elimination has
terminated.
The bottom matrix is in row echelon form as can be verified: there are two leading rows,
the leading entries are normalized to one, and the zero rows are at the bottom of the ma-
trix.
..
1 −2 . −2
.. 7
0 1 . 4
.. ,
0 0
. 0
..
0 0 . 0
we see that from the original four equations two are zero, and the two remaining equa-
tions could be solved for the two unknown variables x1 and x2 .
We note that, while finding the row echelon form of the augmented matrix A, we also
found the row echelon form of the system matrix A. The row echelon form of the system
matrix A is, by inspection, the matrix to the left of the dashed column of the row echelon
7.4 Gauss Elimination: Applications 405
0 0
With respect to the structure of the row echelon form as given by (7.3.19), we have that the
block U in (7.3.19) is for the original system matrix and for the augmented matrix given
by:
1 −2
UA = : 2×2
0 1
1 −2 −2
UA = : 2 × 3.
0 1 47
If we compare the number of leading rows rA in the row echelon form of the system
matrix A and the number rA of leading rows in the row echelon form of the augmented
matrix A, we conclude that they are the same:
rA = rA = 2. (7.4.5)
This is the condition for the existence of the solution, as the next example will also show.
Also, we see that the number of leading rows rA in the row echelon form of the system
matrix A equals the number of unknowns rx in the system equations:
rA = rx = 2. (7.4.6)
This is the condition for the solution when it exists to be unique.
Equation (7.4.8) identifies the system matrix A, the vector of unknowns x, and the vector
of independent terms b.
A = [A | b] (7.4.9)
..
1 −2 . −2
..
5 −2 . 4
= .. . (7.4.10)
3 2
. 8
..
2 −4 . −3
. . .
1 −2 .. −2 1 −2 .. −2 1 −2 .. −2
.. .. ..
5 −2 . 4 0 8 . 14 0 8 . 14
=⇒ =⇒
3 2 ... 8 |{z} 3 2 ... 8 |{z} 0 8 ... 14
1000 1000
.. −5 1 0 .. 0 1 0 ..
2 −4 . −3 0 0 1 0
0 0 0 1
2 −4 . −3 −3 0 1 0
0 0 0 1
2 −4 . −3
.. ..
1 −2 . −2 1 −2 . −2
.. .. 7
0 8 . 14 0 1 . 4
=⇒ . =⇒ ..
8 .. 14
|{z}
1 10 0 0 0 8 . 14
|{z}
1 0 0 0 0
0 1 0 . 0 0 .
0 0 .. 1 0 0 .. 1
8
0 0 1 0 0 0 1 0
−2 0 0 1 0 0 0 1
. .
1 −2 .. −2 1 −2 .. −2
. .
0 1 .. 47 0 1 .. 47
=⇒ .. =⇒ .
|{z} 0 0 ... 1
.
|{z}
1 0 0 0 0 0 0 1000
0 1 0 .. 0 1 0 ..
0 −8 1 0
0 0 0 1
0 0. 1 0 0 0 1
0 0 1 0
0 0. 0
The first five steps repeat the steps in Example 7.4.1, but note that after step 3 in the re-
sulting matrix, the fourth matrix, the fourth row is not identically zero, but entry (4, 3) is
now a one. After the fifth step, the resulting matrix, the sixth matrix (matrix before last)
there is a zero row that is not below all leading rows. So, the sixth step, the last step in this
case, exchanges rows three and row four, so that the zero row is below all leading rows.
7.4 Gauss Elimination: Applications 407
We inspect the row echelon form, the last matrix obtained at the end of Gauss elimination:
..
1 −2 . −2
.. 7
0 1 . 4
.
0 0 ... 1
..
0 0. 0
If we write explicitly the equation corresponding to the third row of the row echelon form,
it is
0x1 + 0x2 = 1.
Clearly there are no possible values of x1 and x2 that can satisfy this equation. The origi-
nal system has no solution.
With respect to the structure of the row echelon form as given by (7.3.19), we have that the
block U in (7.3.19) is for the original system matrix and for the augmented matrix given
by:
1 −2
UA = : 2×2
0 1
1 −2 −2
UA = 0 1 47 : 3 × 3.
0 0 1
In this example the number of leading equations rA in the row echelon form of the sys-
tem matrix A and the number rA of leading equations in the row echelon form of the
augmented matrix A are not the same; in fact:
2 = rA < rA = 3. (7.4.11)
x1 − 2x2 = − 2
5x1 − 10x2 = −10
(7.4.12)
3x1 − 6x2 = − 6
2x1 − 4x2 = − 4.
408 Gauss and Gauss-Jordan Elimination
Equation (7.4.13) identifies the system matrix A, the vector of unknowns x, and the vector
of independent terms b.
A = [A | b] (7.4.14)
..
1− 2 . − 2
..
5 −10 . −10
= .. . (7.4.15)
3 − 6 . − 6
..
2− 4 . − 4
We obtain successively:
.. .. ..
1− 2.− 2 1 −2 . −2 1 −2 . −2
.. .. ..
5 −10 . −10 0 0 . 0 0 0 . 0
=⇒ =⇒
3 − 6 ... − 6 |{z} 3 −6 ... −6 |{z} 0 0 ... 0
1000 1000
.. −5 1 0 .. 0 1 0 ..
2− 4.− 4 0 0 1 0
0 0 0 1
2 −4 . −4 −3 0 1 0
0 0 0 1
2 −4 . −4
..
1 −2 . −2
.
0 0 .. 0
=⇒ . .
0 .. 0
|{z}
1 0 0 0 0
0 1 0 ..
0 0 1 0
−2 0 0 1
0 0. 0
x1 − 2x2 = −2
0x1 − 0x2 = 0
0x1 − 0x2 = 0
0x1 − 0x2 = 0.
The last three equations are trivial; they are satisfied by any arbitrary values of x1 and x2 .
The system (7.4.12) is reduced to the first row since there is only one leading row in A.
So, the solution to the original system (7.4.12) of four equations in two unknowns is now
x1 = −2 + 2x2 .
This equation represents a family of solutions—line on the two dimensional plane (x1 , x2 ).
The variable x2 is referred to as a free variable because it can take any value. The family
of solutions is parameterized by the free variable. Each value of the free variable then de-
termines the value of x1 .
With respect to the structure of the row echelon form as given by (7.3.19), we have that the
block U in (7.3.19) is for the original system matrix and for the augmented matrix given
by:
UA = 1 −2 : 1×2
UA = 1 −2 −2 : 1 × 3.
In this example, we see that the number of leading rows rA in the row echelon form of A
and the number of leading rows rA in the row echelon form of A are equal, so, the solution
exists, but this number rA is strictly smaller than the number of unknowns rx :
rA < rx = 2. (7.4.16)
This is the condition for the solution to exist but to be non unique.
Ax = b, (7.4.17)
like with ordinary differential and difference equations, there are structural questions re-
garding solving the linear system that we would like to address even before we attempt
to solve the linear system. These structural questions are:
410 Gauss and Gauss-Jordan Elimination
Existence of solution: Even before we solve the system, it is important to know if there
is or there is not a solution. If the system has no solution, it is inconsistent. If the
existence question is answered positively, the system is consistent.
Unicity of solution: For a consistent system, the solution may be unique or there may
be more than one solution. If the solution exists, but is not unique, the system is
said to be degenerate. With degenerate systems, when the solution exists but it is
not unique, the Examples showed that there is an uncountable set of solutions–a
family of solutions. We will see in Chapter 10 that the solution set has nice algebraic
properties and nice structure.
Ax = 0. (7.4.18)
Gauss elimination is a means to answer the existence and unicity structural questions for
generic inhomogeneous systems of algebraic equations (7.4.17) and the question of unic-
ity of solutions for the homogeneous system of equations (7.4.18).
.
A = [A..b]
and the system matrix A associated with the linear system are reduced to a triangular
form, the row echelon form. The row echelon form has the generic structure given in Sec-
tion 7.3.2.1 by (7.3.19) or by (7.3.20):
UA
A= (7.4.19)
0
UA
A= . (7.4.20)
0
UA : rA × n
UA : rA × (n + 1).
7.4 Gauss Elimination: Applications 411
The response to the structural questions can now be given in terms of the row dimen-
sions rA and rA of UA and UA , respectively, and of the number of unknowns rx . These
conditions are summarized by:
No solution: rA < rA .
Existence of solution: rA = rA .
Unique solution: rA = rx .
Family of solutions: rA < rx . In this case, the family of solutions is parameterized
by the free variables (determined for example by solving the backward step).
The number of free variables is rx − rA .
We will have occasion to come back to these conditions and cast them in terms of other
relevant parameters of the augmented and system matrices.
We assume now that A is square and the first entry is not zero. We saw in Subsec-
tion 7.3.1.1 that the reduction of a matrix A to the row echelon form is achieved by left
multiplication by elementary matrices. If there are ` such elementary steps then:
E ···E A = U (7.4.22)
| ` {z }1
E=product elementary matrices
EA = U. (7.4.23)
We saw in Subsection 7.3.1.2 that the elementary matrices are invertible, so:
A = E−1 U. (7.4.24)
If there are no row exchanges, i.e., no elementary operations of type O2 , the elementary
matrices Ei , i = 1, · · · , ` are all lower triangular matrices, their product is lower triangular,
and the inverse of their product is lower triangular. Represent this by L, i.e.:
L = E−1 .
412 Gauss and Gauss-Jordan Elimination
A = LU. (7.4.25)
The three matrices under the first three elementary matrix factors on the left are the in-
verses (by reverse order) of these elementary matrices. Multiplying these three matrices
and writing them on the RHS, we get the desired LU factorization
3 55 100 3 5 5
2 −2 6 = 2 1 0 0 − 16 8 .
3 3 3
1 1
1 −1 2 3 2
1 0 0 −1
This decomposition is not unique; given square matrix A, there are many possible decom-
positions like (7.4.25) because neither L nor U are unique. However, the decomposition
is made unique by normalizing the diagonal entries of L to being ones. This entails not
normalizing the leading rows of the row echelon form by the pivots as we now explain.
Assume the steps in Gauss elimination do not involve row exchange elementary opera-
tions O2 nor normalization by the inverse of the pivots elementary operations O1 . Then E
is the product of O3 elementary matrices given by (7.3.7), with inverses given by (7.3.18).
Before stating the result, we look at the example of multiplying two such inverses:
100 100 100
−2 1 0 0 1 0 = −2 1 0 .
001 −3 0 1 −3 0 1
This is very intuitive. First recall that the LHS of this equality is the product of the inverses
of two O3 elementary operations, which means the original O3 operations are applied in
reverse order. The first matrix factor in the LHS is the inverse of the O3 operation where
we had subtracted from row two the first row multiplied by 2; this Gauss elimination step
7.4 Gauss Elimination: Applications 413
zeros out the entry (2, 1) of the original matrix. Likewise, the second matrix factor is the
inverse of the O3 operation where we had subtracted from row three the first row multi-
plied by 3; this Gauss elimination step zeros out the entry (3, 1) of the original matrix.
Now, looking at the result of the product of these two inverse matrices, the matrix on the
RHS, we see that, besides the diagonals that are ones, the nonzero entries are the entries
(2, 1) and (3, 1) that save the negative of the scalars (−2 and −3) used to zero out these
entries (2, 1) and (3, 1) in the original matrix.
We now see this more generally. Consider two successive O3 elementary operations
(I + αi2 j2 Ei2 j2 )(I + αi1 j1 Ei1 j1 ).
We assume that i1 6= i2 6= j1 6= j2 . The inverse of this product is:
[(I + αi2 j2 Ei2 j2 )(I + αi1 j1 Ei1 j1 )]−1 = (I − αi1 j1 Ei1 j1 )(I − αi2 j2 Ei2 j2 )
= I − αi1 j1 Ei1 j1 − αi2 j2 Ei2 j2
since for i1 6= i2 6= j1 6= j2
Ei1 j1 Ei2 j2 = 0.
In summary, when only O3 elementary operations are used to reduce a square matrix A
to row echelon form in Gauss elimination (no row exchange nor normalization by the
pivots), the structure of the lower triangular matrix in the LU decomposition (7.4.25) is
readily available:
X
L=I− αij Eij . (7.4.26)
i,j,i>j
The scalars αij are the scalars used in the O3 elementary operations used in Gauss elim-
ination to zero out the entry (i, j) below the diagonal of the square matrix A. If a given
entry (i, j) below the diagonal of the square matrix A was zero to start with, then the
corresponding αij = 0.
The diagonal entries of L in (7.4.26) are all ones. With this normalization, the LU decom-
position1 in (7.4.25) of matrix A is unique. It is referred to as the LU decomposition of the
square matrix A.
We have shown this decomposition to hold for square matrices, where the row echelon
form is obtained by Gauss elimination with no O1 nor O2 elementary operations. The
result holds more generally for arbitrary rectangular matrices by allowing for a permuta-
tion matrix. We will not comment further on this issue.
1
In the expression “LU decomposition,” LU is an adjective to the noun decomposition, and so it is not
boldfaced. In other words, it does not stand for the matrix factors L and U, rather, it stands for the first
letters of “Lower” and “Upper” triangular factors.
414 Gauss and Gauss-Jordan Elimination
Remark 7.4.1 (Mnemonic on LU decomposition). With the insight provided on the ma-
trix L, Gauss elimination can be slightly modified to easily obtain the entries of L.
We illustrate this with the first three steps in Example 7.4.2 where we use Gauss elimination
to reduce to row echelon form the augmented matrix A of the linear system in the example.
We reproduce these three steps here for easy recall:
.. .. ..
1 −2 . −2 1 −2 . −2 1 −2 . −2
.. .. .
5 −2 . 4 0 8 . 14 0 8 .. 14
A= =⇒ =⇒
3 2 ... 8 |{z} 3 2 ... 8 |{z} 0 8 ... 14
1 0 0 0 1 0 0 0
.. −5 1 0 .. 0 1 0 ..
2 −4 . −3 0 0 1 0
0 0 0 1
2 −4 . −3 −3 0 1 0
0 0 0 1
2 −4 . −3
..
1 −2 . −2
..
0 8 . 14
=⇒
.
.
.
|{z} 0
1 0 0 0
8 14
0 1 0 ..
0 0 1 0
−2 0 0 1
0 0. 1
These three steps only use elementary operation O3 , and they zero out the three entries in the
first column below entry (1, 1).
The three factors used can be called out from the elementary matrices below the arrows: −5,
−3, and −2. The first column of L is then [1 5 3 2]T . The mnemonic is to register these
entries of L, as Gauss elimination progresses, by replacing by these scalars in parenthesis the
zeroed out entries. For this example:
.. .. ..
1 −2 . −2 1 −2 . −2 1 −2 . −2
.. .. .
5 −2 . 4 (5) 8 . 14 (5) 8 .. 14
A= =⇒ =⇒
3 2 ... 8 3 2 ... 8 (3) 8 ... 14
.. .. ..
2 −4 . −3 2 −4 . −3 2 −4 . −3
..
1 −2 . −2
..
(5) 8 . 14
=⇒
(3) 8 ... 14
..
(2) 0 . 1
7.4 Gauss Elimination: Applications 415
This says that if the determinant of the square matrix A is nonzero, |A| 6= 0, the diag-
onal entries of U are all nonzero, so they are the leading entries of the row echelon form
of A, and so they are the pivots used in the Gauss elimination of A.
1 0 − 13
A = 0 −8 83 . (7.4.27)
0 6 31
3
By (7.2.31) and (7.2.32), this matrix A is reduced to the identity matrix, i.e.,
1 0 13
1 00 10 0
0 1 0 0 1 31 0 0 0
3
001 0 01 0 0 37
| {z }
EGJ
1 0 − 13
1 00 1 0 0 100
0 1 0 0 − 1 0 0 0 1 0 6 31
8 3
8
0 −6 1 0 0 1 010 0 −8 3
| {z }
EG
100
= 0 1 0 .
001
In other words,
EGJ EG A = I.
Since the inverse of a square matrix when it exists is unique, see Result 6.6.1 in Chapter 6,
we have
−1
1 0 − 13
0 6 31 = EGJ EG
3
0 −8 83
1 0 31
100 10 0 1 00 1 00 100
= 0 1 0 0 1 13 0 0 0 0 1 0 0 − 81 0 0 0 1 .
3
001 001 0 0 37 0 −6 1 0 01 010
Since each factor on the right hand side has been computed as we apply Gauss-Jordan
elimination, this procedure provides a method to compute the inverse of the invertible
matrix A.
Rather than performing the multiplication of the factors EGJ EG , we can compute the
inverse of the matrix by performing Gauss-Jordan elimination on the matrix
[A|I],
7.5 Problems 417
i.e., on the matrix that is the concatenation of the matrix A, whose inverse we want to
compute and is assumed to exist, and the identity matrix I. We can readily conclude that
If the matrix A is not invertible, then Gauss-Jordan elimination would lead in some step
to a row of zeros; this is an indicator that the matrix is not invertible. You can conclude
this because
since the matrix on the left has a row of zeros, so its determinant is zero, and the determi-
nant of |EGJ EG | =
6 0 since it is the product of invertible matrices.
7.5 Problems
1. Consider the matrix A
1 −1 1 2
A = 3 1 1 0
2 −1 1 −1
(a) Write the linear system of algebraic equations for which A is the augmented
matrix. Identify explicitly the system matrix A, the vector of dependent vari-
ables x, and the vector of independent terms b.
(b) Apply Gauss elimination to A, representing each step by a corresponding ele-
mentary matrix Ei .
(c) Determine the row echelon forms of A and A.
(d) Determine if the solution of the system exists, and if it exists if it is unique or
not.
(e) If S is the set of solutions for the linear system in Part 1a, determine its cardi-
nality.
(f) If the cardinality of S is one, determine the solution of the linear system in
Part 1a; if the cardinality of S is greater than one, determine the family of solu-
tions of the linear system in Part 1a, i.e., the generic xs ∈ S.
(g) Determine the LU -factorization of A.
Hint:Although in the Lecture we considered the LU-factorization of square ma-
trices, this factorization is valid for arbitrary matrices and can be determined
by Gauss elimination as seen by this Example.
418 Gauss and Gauss-Jordan Elimination
5 33 14
(a) Write the augmented matrix A of the linear system (7.5.1) and identify the sys-
tem matrix, the vector of dependent variables, and the vector of independent
terms.
(b) Reduce the augmented matrix to its row echelon form. Identify the row echelon
form of the system matrix.
(c) Determine from your answer to Part 4b if the linear system (7.5.1) has a solu-
tion, a unique solution, or a family of solutions. Determine the unique solution
or the family of solutions if either exists. Justify your answer.
(d) Perform back substitution on the row echelon form of the augmented matrix
determined in Part 4b.
(e) Perform Gauss-Jordan on the row echelon form determined in Part 4b.
(f) Adjoin to (7.5.1) a fourth equation
x1 + 2x2 + 3x3 = −1. (7.5.2)
Let A
e and A e be the system and augmented matrices of the new system ob-
tained by augmenting the system (7.5.1) with this equation (7.5.2). Repeat
Parts 4b and 4c.
7.5 Problems 419
x1 −2x2 −2x3 = 1
2x1 + x2 + x3 = 3 (7.5.5)
2x1 + x2 +2x3 = 3
420 Gauss and Gauss-Jordan Elimination
8. Consider the system of linear algebraic equations with real valued coefficients:
x1 + x2 +λx3 +x4 = 1
x1 +λx2 + x3 −x4 = 2 (7.5.6)
λx1 + x2 + x3 =3
(a) Determine, if any, value or values of λ so that (7.5.6) has a single solution.
Justify your answer.
(b) Determine, if any, value or values of λ so that (7.5.6) has no solution. Justify
your answer.
(c) Determine, if any, value or values of λ so that (7.5.6) has a family of solutions.
Justify your answer.
References
[1] E. Kryszig, Advanced Engineering Mathematics, 10th ed. New York, NY: John Wiley
& Sons, August 2011.
[2] D. G. Zill and W. S. Wright, Advanced Engineering Mathematics, 4th ed. MA: James
and Bartlett Publisher, 2010.
[3] G. Strang, Introduction to Linear Algebra, 4th ed. Wellesley, MA: Wellesley-
Cambridge Press, 2009.
[4] P. R. Halmos, Naive Set Theory, ser. The University Series in Undergraduate Mathe-
matics. New York, New York: Van Nostrand Reinhold Company, 1960.
[5] K. Hrbaceck and T. Jech, Introduction to Set Theory, ser. Pure and Applied Mathemat-
ics. A Series of Monographs and Textbooks. New York, New York: Marcel Dekker,
Inc., 1984.
[6] R. V. Churchill, Complex Variables and Applications, Second Edition Hardcover. New
York, NY: McGraw-Hill, 1960, original copyright 1940.
[7] H. Haber. (2011) The complex logarithm, exponential and power functions. [Online].
Available: http://scipp.ucsc.edu/∼haber/ph116A/clog 11.pdf
[8] ——. (2012, September) The complex inverse trigonometric and hyperbolic
functions. [Online]. Available: http://scipp.ucsc.edu/∼haber/webpage/arc.pdf
[9] K. Hoffman and R. Kunze, Linear Algebra, ser. Prentice-Hall Mathematics Series. En-
glewood Cliffs, NJ: Prentice-Hall, Inc, 1961.
[10] H. Anton, Elementary Linear Algebra 2e . New York: John Wiley & Sons, Inc, 1973.
[11] I. S. Gradshteyn and I. M. Ryzhik, Table of integrals, Series, and Products. San Diego,
CA: Academic Press, Inc, 1965, 1980.