Anda di halaman 1dari 150

Contents

1 Set Theory 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Sets of numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Sets of points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Subsets: Inclusion and Equality of Sets . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Special Sets: Empty Set, Universal Set, Power Set, and Intervals . . . . . . . 8
1.5 Graphical Representation: Venn Diagrams . . . . . . . . . . . . . . . . . . . . 9
1.6 Set Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1 Operations on sets: Definitions . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1.1 Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1.2 Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1.3 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1.4 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1.5 Symmetric difference . . . . . . . . . . . . . . . . . . . . . . 12
1.6.2 Operations on sets: Properties . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2.1 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2.2 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.2.3 Distributivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6.2.4 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2.5 Union identity set . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2.6 Intersection identity set . . . . . . . . . . . . . . . . . . . . . 15
1.7 Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8 Cartesian Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.9 Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.10.1 Cardinality: Countable sets . . . . . . . . . . . . . . . . . . . . . . . . 34
1.10.2 Cardinality: Uncountable sets . . . . . . . . . . . . . . . . . . . . . . . 38
1.10.2.1 Cantor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 38
1.10.2.2 Cardinality of the reals . . . . . . . . . . . . . . . . . . . . . 43
1.10.3 Arithmetic of the cardinal numbers . . . . . . . . . . . . . . . . . . . 46
1.10.3.1 Ordering and equality . . . . . . . . . . . . . . . . . . . . . . 47
CONTENTS i

1.10.3.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.10.3.3 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.10.3.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.10.3.5 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.3.6 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.3.7 Operating with cardinals . . . . . . . . . . . . . . . . . . . . 53
1.11 Historical Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.12 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.A Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.A.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.A.2 Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2 Complex Numbers and Elementary Functions 69


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2 Representations of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . 70
2.2.1 Rectangular representation . . . . . . . . . . . . . . . . . . . . . . . . 70
2.2.2 Planar or graphical representation . . . . . . . . . . . . . . . . . . . . 71
2.2.3 Polar representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.2.3.1 Switching between polar and rectangular representations . 73
2.2.4 Exponential representation: Euler’s formula . . . . . . . . . . . . . . 76
2.2.5 Real and purely imaginary numbers . . . . . . . . . . . . . . . . . . . 78
2.3 Arithmetic of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.3.1 Equality of complex numbers . . . . . . . . . . . . . . . . . . . . . . . 79
2.3.2 Complex conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.3.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.3.4 Multiplication or product . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.3.5 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.3.6 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.3.6.1 Multiplicative inverse . . . . . . . . . . . . . . . . . . . . . . 87
2.3.6.2 Division of complexes . . . . . . . . . . . . . . . . . . . . . . 89
2.3.7 Field of complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.4 Principal Value of Functions of Complex Variables . . . . . . . . . . . . . . . 91
2.5 Integer Powers of Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6 The Root Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.6.1 nth-roots of unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.6.2 nth-roots of complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.6.3 Rational roots of complexes . . . . . . . . . . . . . . . . . . . . . . . . 99
2.7 The Exponential and the Natural Logarithm . . . . . . . . . . . . . . . . . . 100
2.7.1 Exponential function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.7.2 The natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.3 Exponential and complex logarithm . . . . . . . . . . . . . . . . . . . 104
2.8 Arbitrary Powers of Complex Numbers . . . . . . . . . . . . . . . . . . . . . 105
2.9 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
ii CONTENTS

2.9.1 Linear and quadratic polynomials . . . . . . . . . . . . . . . . . . . . 110


2.9.2 The Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . 112
2.9.3 Operating with polynomials . . . . . . . . . . . . . . . . . . . . . . . 112
2.9.3.1 Equality of polynomials . . . . . . . . . . . . . . . . . . . . . 112
2.9.3.2 Addition of polynomials . . . . . . . . . . . . . . . . . . . . 113
2.9.3.3 Multiplication of polynomials . . . . . . . . . . . . . . . . . 113
2.10 Rational Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.11 Trigonometric and Hyperbolic Functions . . . . . . . . . . . . . . . . . . . . 115
2.12 Phasors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.12.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.12.2 Periodic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.12.3 Complex notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
2.12.4 Phasors in the complex plane . . . . . . . . . . . . . . . . . . . . . . . 123
2.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.A Trigonometric and Hyperbolic Formulas of Interest . . . . . . . . . . . . . . 132

3 Calculus with Complex Functions 133


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.2 Neighborhoods and Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.2.1 Distance between complex numbers . . . . . . . . . . . . . . . . . . . 134
3.2.2 Balls, open sets, and domains . . . . . . . . . . . . . . . . . . . . . . . 136
3.3 Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.5 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.5.1 Analyticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.5.2 Cauchy-Riemann conditions . . . . . . . . . . . . . . . . . . . . . . . 146
3.6 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

4 First Order Ordinary Differential and Difference Equations 158


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.2 A First Order Linear Ordinary Differential Equation with Constant Coeffi-
cients: Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.2.2 Example of solution of a 1st order ODE and IVP . . . . . . . . . . . . 162
4.3 A 1st Order Linear Ordinary Difference Equation with Constant Coeffi-
cients: Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.4 First Order Continuous Time ODE: General Case . . . . . . . . . . . . . . . . 173
4.5 First Order Discrete Time ODE: General Case . . . . . . . . . . . . . . . . . . 176
4.6 Properties of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.7 Guessing Method or Method of Substitution . . . . . . . . . . . . . . . . . . 182
4.7.1 Continuous time ODE . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7.2 Discrete time ODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
CONTENTS iii

4.8 Method of Variation of Parameters or Method of Variation of Constants . . 185


4.8.1 1st order ODE: Continuous time . . . . . . . . . . . . . . . . . . . . . 185
4.8.2 1st order ODE: Discrete time . . . . . . . . . . . . . . . . . . . . . . . . 188
4.9 Qualitative Behavior: Asymptotics and Stability . . . . . . . . . . . . . . . . 190
4.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.A Signals of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.A.1 Continuous time signals . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.A.1.1 Constant signal . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.A.1.2 Continuous time step function . . . . . . . . . . . . . . . . . 194
4.A.1.3 Continuous time exponential function . . . . . . . . . . . . 195
4.A.1.4 Continuous time trigonometric signals . . . . . . . . . . . . 196
4.A.1.5 Generalized functions . . . . . . . . . . . . . . . . . . . . . . 197
4.A.2 Discrete time signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.A.2.1 Discrete time constant signal . . . . . . . . . . . . . . . . . . 200
4.A.2.2 Discrete time step function . . . . . . . . . . . . . . . . . . . 200
4.A.2.3 Discrete time exponential function . . . . . . . . . . . . . . 200
4.A.2.4 Discrete trigonometric signal . . . . . . . . . . . . . . . . . . 201
4.A.2.5 Discrete delta signal . . . . . . . . . . . . . . . . . . . . . . . 201
4.B Leibnitz Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.C Method of Substitution: Tables of Particular Solutions . . . . . . . . . . . . . 201

5 Second Order Ordinary Differential and Difference Equations 204


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.2 Preliminaries: General Method of Solution . . . . . . . . . . . . . . . . . . . 205
5.2.1 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.2.2 Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.2.3 Four-step solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.3 Homogeneous ODEs and IVP . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.3.1 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.3.1.1 Examples of solutions to homogeneous ODEs and IVPs . . 214
5.3.1.2 General case: Continuous time . . . . . . . . . . . . . . . . . 225
5.3.2 Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.3.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
5.3.2.2 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
5.4 Particular Solution–the Guessing Method or Method of Substitution . . . . 241
5.4.1 Guessing method: Continuous time . . . . . . . . . . . . . . . . . . . 243
5.4.2 Guessing method: Discrete time . . . . . . . . . . . . . . . . . . . . . 248
5.5 Inhomogeneous ODE and IVP . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
5.5.1 General solution 2nd order continuous time ODE . . . . . . . . . . . 253
5.5.2 General solution 2nd order discrete time ODE . . . . . . . . . . . . . 255
5.6 System Function: Exponential Forcing . . . . . . . . . . . . . . . . . . . . . . 257
5.6.1 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
5.6.2 Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
iv CONTENTS

5.7 Properties of Zero-Input Response and Qualitative Behavior . . . . . . . . . 263


5.7.1 Qualitative behavior: Continuous time zero-input response . . . . . 264
5.7.2 Qualitative behavior: Discrete time zero-input response . . . . . . . 269
5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

6 Vectors and Vector Calculus 281


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6.2 Vectors: Column and Row Vectors . . . . . . . . . . . . . . . . . . . . . . . . 282
6.2.1 Column vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
6.2.2 Row vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
6.2.3 Vector dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
6.3 Calculus with Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
6.3.1 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.3.2 Vector Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.3.2.1 Properties of vector addition . . . . . . . . . . . . . . . . . . 291
6.3.3 Product of a vector by a scalar . . . . . . . . . . . . . . . . . . . . . . 292
6.3.4 Vector conjugation, transposition, and Hermitian . . . . . . . . . . . 293
6.3.4.1 Vector conjugation . . . . . . . . . . . . . . . . . . . . . . . . 293
6.3.4.2 Vector transposition . . . . . . . . . . . . . . . . . . . . . . . 294
6.3.4.3 Vector Hermitian . . . . . . . . . . . . . . . . . . . . . . . . . 295
6.3.5 Linear combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
6.3.6 Limits, derivatives, integration, delay, and Taylor series . . . . . . . . 298
6.3.6.1 Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
6.3.6.2 Derivative of a vector of functions . . . . . . . . . . . . . . . 299
6.3.6.3 Integral of a vector of functions . . . . . . . . . . . . . . . . 299
6.3.6.4 Advance and delay . . . . . . . . . . . . . . . . . . . . . . . 300
6.3.6.5 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
6.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
6.4.1 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.4.1.1 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.4.1.2 Scalar representation . . . . . . . . . . . . . . . . . . . . . . 305
6.4.1.3 Column vector representation . . . . . . . . . . . . . . . . . 306
6.4.1.4 Row vector representation . . . . . . . . . . . . . . . . . . . 307
6.4.1.5 Block matrix representation . . . . . . . . . . . . . . . . . . 309
6.4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
6.5 Calculus with Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
6.5.1 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
6.5.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
6.5.2.1 Addition: Properties. . . . . . . . . . . . . . . . . . . . . . . 324
6.5.3 Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
6.5.3.1 Matrix-scalar product . . . . . . . . . . . . . . . . . . . . . . 325
6.5.3.2 Row vector by column vector: Scalar product . . . . . . . . 326
6.5.3.3 Column vector by row vector: Outer product . . . . . . . . 328
CONTENTS v

6.5.3.4 Matrix-matrix product . . . . . . . . . . . . . . . . . . . . . 330


6.5.3.5 Matrix-matrix products where one is either in row or col-
umn form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
6.5.3.6 Powers: square matrices . . . . . . . . . . . . . . . . . . . . 339
6.5.3.7 Product of Matrices: Properties . . . . . . . . . . . . . . . . 340
6.5.4 Matrix Conjugation, Transposition, and Hermitian . . . . . . . . . . 343
6.5.4.1 Conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
6.5.4.2 Transposition and symmetric matrices . . . . . . . . . . . . 343
6.5.4.3 Hermitian and Hermitian matrices . . . . . . . . . . . . . . 346
6.5.5 Matrices with special structure . . . . . . . . . . . . . . . . . . . . . . 347
6.5.6 Limits, derivatives, integration, delay, and Taylor series with matrices349
6.5.6.1 Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
6.5.6.2 Derivative of a matrix of functions . . . . . . . . . . . . . . 350
6.5.6.3 Integral of a matrix of functions . . . . . . . . . . . . . . . . 350
6.5.6.4 Advance and delay . . . . . . . . . . . . . . . . . . . . . . . 351
6.5.6.5 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
6.6 Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
6.6.1 Determinant of a Square Matrix . . . . . . . . . . . . . . . . . . . . . 352
6.6.2 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
6.6.3 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

7 Gauss and Gauss-Jordan Elimination 376


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
7.2 Gauss and Gauss-Jordan Elimination: Examples . . . . . . . . . . . . . . . . 380
7.2.1 Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 380
7.2.2 Reducing Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
7.2.3 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
7.3 Gauss, Gauss-Jordan Elimination: General Case . . . . . . . . . . . . . . . . 391
7.3.1 Row elementary operations: elementary matrices . . . . . . . . . . . 392
7.3.1.1 Products of elementary matrices . . . . . . . . . . . . . . . . 393
7.3.1.2 Inverses of elementary matrices . . . . . . . . . . . . . . . . 394
7.3.2 Row and reduced row echelon forms . . . . . . . . . . . . . . . . . . 395
7.3.2.1 Row echelon form: Structure . . . . . . . . . . . . . . . . . . 398
7.3.2.2 Reduced row echelon form: Structure . . . . . . . . . . . . . 399
7.3.3 Gauss, Gauss-Jordan elimination: Pseudo-code . . . . . . . . . . . . 399
7.3.4 Shortcuts to Gauss, Gauss-Jordan elimination . . . . . . . . . . . . . 400
7.3.4.1 No row exchange . . . . . . . . . . . . . . . . . . . . . . . . 401
7.3.4.2 No normalization by pivots . . . . . . . . . . . . . . . . . . 401
7.4 Gauss Elimination: Applications . . . . . . . . . . . . . . . . . . . . . . . . . 401
7.4.1 Linear systems: Conditions for solution . . . . . . . . . . . . . . . . . 402
7.4.1.1 Conditions for solution: Examples . . . . . . . . . . . . . . 402
7.4.1.2 General conditions for solution . . . . . . . . . . . . . . . . 409
vi CONTENTS

7.4.2 LU decomposition of square matrices . . . . . . . . . . . . . . . . . . 411


7.4.3 Determinant of square matrices . . . . . . . . . . . . . . . . . . . . . . 415
7.4.4 Inverse of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . 415
7.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

7 Vector Spaces 377


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
7.2 Fields and Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
7.2.1 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
7.2.2 Vector space over a field . . . . . . . . . . . . . . . . . . . . . . . . . . 380
7.2.3 Vector space: Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 382
7.3 Linear Combination, Span, and Linear Independence . . . . . . . . . . . . . 389
7.3.1 Linear combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
7.3.2 Linear span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
7.3.3 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
7.3.3.1 Linear independence: Test for n vectors in Rm . . . . . . . . 401
7.3.3.2 Linear independence: Test for n polynomial vectors . . . . 403
7.4 Basis and Dimension of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 405
7.4.1 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
7.4.2 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
7.5 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
7.5.1 Definition and examples of subspaces . . . . . . . . . . . . . . . . . . 417
7.6 Composition of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
7.6.1 Intersection of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 424
7.6.2 Sum of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
7.6.3 Direct sum of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 428
7.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

8 Geometry of Vector Spaces 443


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
8.2 Inner product space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
8.2.1 Inner product and inner product space . . . . . . . . . . . . . . . . . 444
8.3 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
8.4 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
8.4.1 Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
8.4.2 Triangle inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
8.5 Angle and orthogonality between two vectors . . . . . . . . . . . . . . . . . 466
8.5.1 Angle between two vectors . . . . . . . . . . . . . . . . . . . . . . . . 466
8.6 Orthogonal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
8.6.1 Orthogonality and linear independence . . . . . . . . . . . . . . . . . 468
8.6.2 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . 469
8.7 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . 470
8.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
CONTENTS vii

10 Systems of Algebraic Linear Equations 524


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
10.2 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
10.2.1 Rank and Gauss elimination . . . . . . . . . . . . . . . . . . . . . . . 530
10.2.2 Rank and linear systems of algebraic equations . . . . . . . . . . . . 534
10.3 Subspaces Associated with a matrix A . . . . . . . . . . . . . . . . . . . . . . 535
10.3.1 Column space R(A)  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
T
10.3.2 Row space R A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
10.3.3 Null space N (A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
10.3.4 Left null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
10.3.5 Structure of subspaces and Algebra . . . . . . . . . . . . . . . . . . . 553
10.4 Linear systems and matrix subspaces . . . . . . . . . . . . . . . . . . . . . . . 555
10.4.1 Existence and unicity of solution and matrix subspaces . . . . . . . . 556
10.4.2 Linear systems: Method of solution . . . . . . . . . . . . . . . . . . . 559
10.4.3 Step 1: General homogeneous solution . . . . . . . . . . . . . . . . . 559
10.4.4 Step 2: Particular solution . . . . . . . . . . . . . . . . . . . . . . . . . 562
10.4.5 Step 3: General solution . . . . . . . . . . . . . . . . . . . . . . . . . . 563
10.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

11 Matrices: Spectral Analysis 568


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
11.2 Spectral Analysis: Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . 569
11.2.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
11.2.2 Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
11.2.3 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
11.2.4 Diagonalizable matrices: applications . . . . . . . . . . . . . . . . . . 580
11.3 Left and Right Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
11.4 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
11.4.1 Orthogonal and unitary matrices . . . . . . . . . . . . . . . . . . . . . 589
11.4.2 Symmetric and Hermitian matrices . . . . . . . . . . . . . . . . . . . 591
11.4.3 Positive definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . 595
11.4.4 Similarity – Similar matrices . . . . . . . . . . . . . . . . . . . . . . . 597
11.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

12 nth Order Homogeneous ODEs and IVPs 597


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
12.2 nth Order ODEs and IVPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
12.3 Solution space of nth Order Homogeneous CT ODEs . . . . . . . . . . . . . . 605
12.3.1 Linear space of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 605
12.3.2 Wronskian and linear dependence of functions . . . . . . . . . . . . . 607
12.3.3 Wronskian and linear dependence of solutions . . . . . . . . . . . . . 610
12.4 Solution of nth Order Homogeneous CT Linear ODE . . . . . . . . . . . . . . 611
12.5 nth Order Homogeneous DT Linear ODE . . . . . . . . . . . . . . . . . . . . 616
viii CONTENTS

12.5.0.1 Operator Notation: Discrete Time . . . . . . . . . . . . . . . 616


12.5.1 nth Order Homogeneous DT Linear ODE: General Solution . . . . . . 616
12.5.2 nth Order Homogeneous DT Linear ODE: Space of Solutions . . . . . 616
12.6 nth Order Homogeneous CT Linear IVP . . . . . . . . . . . . . . . . . . . . . 617
12.6.0.1 Zero input response: Interpretation of the solution to the
homogeneous CT IVP . . . . . . . . . . . . . . . . . . . . . . 619
12.7 Examples of Homogeneous ODEs and IVPs . . . . . . . . . . . . . . . . . . . 620

13 Linear Systems of Ordinary Differential and Difference Equations 614


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
Chapter 6

Vectors and Vector Calculus

6.1 Introduction
4 and 5 studied linear first and second order ordinary differential and

C
HAPTERS
difference equations (ODEs) with constant coefficients. The differential equations
involved derivatives of continuous time functions. The difference equations in-
volved delayed discrete time sequences. Both types of ODEs involve memory elements
(differentiators and delay elements); i.e., their solution at the current time depends on
current and past values of the input or forcing term and past values of the indeterminate.

This Chapter starts our journey in the study of systems of linear algebraic equations, or
simply a linear system of equations. Our goal is to build tools to solve simultaneous
equations like:

4x1 + 6x2 + 9x3 = 6


6x1 − 2x3 = 20 (6.1.1)
5x1 − 8x2 + x3 = 10

In this system, there are unknown quantities, namely, the variables x1 , x2 , and x3 , and
known quantities like the numbers 6, 20, and 10 on the right hand side of the equations.
In this example, the number of equations is the same as the number of unknowns.

To write explicitly these three equations in three unknowns is not very onerous. If all we
had to do in life was solving such few linear algebraic equations in such few unknowns,
writing them as above and solving them by traditional methods would suffice. But, if we
have instead to solve a system of one thousand equations in one thousand unknowns, just
to write them would be tedious, let alone solving them by traditional methods. With com-
puters, we can solve linear systems with many more, thousands, hundreds of thousands,
if not millions of linear algebraic equations. It is possible to represent these equations in
a much more compact form by using vectors and matrices. Once introduced, we operate
directly with vectors and matrices. The language of vectors and matrices simplifies the
282 Vectors and Vector Calculus

writing of these large systems of equations and very importantly helps focusing on the
underlying concepts, without being overwhelmed with the details of so many equations.

The word algebraic means that there are no memory elements, no differentiators (really,
integrators) nor delays. These linear systems of algebraic equations appear, even if under
disguise, in most practical applications, and it is never too much to emphasize their rel-
evance in science and engineering. We actually already saw that, when solving first and
second order ODEs in Chapters 4 and 5, we had to solve linear systems of algebraic equa-
tions when imposing the initial conditions. Granted that, with first order ODEs, these
linear systems of algebraic equations were trivially reduced to a single linear equation in
a single unknown. With second order ODEs, we did have to solve two linear algebraic
equations in two unknowns, which better justifies the use of the word system when refer-
ring to these equations.

Linear systems of algebraic equations serve for us also as the underlying thread to study-
ing many other important concepts in Linear Algebra – vectors, matrices, eigenvalues
and eigenvectors, vector spaces, among others.

This Chapter sets the basic machinery for studying systems of linear algebraic equations.
The Chapter introduces vectors and matrices. Section 6.2 focus on vectors, while Sec-
tion 6.3 introduces calculus with vectors. Section 6.4 introduces matrices and Section 6.5
calculus with matrices and special types of matrices. Section 6.6 introduces important
functions with matrices like the determinant, trace, and revisit the concept of inverse of a
matrix introduced in Section 6.4. Finally, section 6.7 lists a few illustrative problems.

6.2 Vectors: Column and Row Vectors


We are all familiar with scalar numerical quantities, for example, −3, the transcendental
number π, the Neper number e, or the imaginary symbol j. Numerical valued scalars
can be integer valued, real valued, or complex valued. We gave in Chapter 1 examples
of sets of scalars like the natural numbers, N, the integers, Z, the reals, R, or the complex
numbers, C. A numerically valued scalar quantity can have a specific fixed known nu-
merical value, or can represent a fixed unknown numerical value, or its numerical value
can be varying. Numerically valued scalars may be referred to by different designations,
including scalars, variables, or parameters.

Cartesian products of n sets X1 × · · · × XN , also introduced in Chapter 1, extended


sets of scalars to sets of N -tuples, where a N -tuple x is a concatenation of N scalars
x = (x1 , · · · , xN ). An element xn is a component or coordinate of the N -tuple x. For ex-
ample, 2-tuples are ordered pairs, x = (x1 , x2 ), e.g., 2-tuples of real numbers are ordered
pairs x = (x1 , x2 ) ∈ R2 , like (3.1, −2.33). When the sets Xn are identical and numerical
sets, an alternative representation for the N -tuple is that of a vector.
6.2 Vectors: Column and Row Vectors 283

Vectors can be column vectors and row vectors as we consider next in Subsections 6.2.1
and 6.2.2.

6.2.1 Column vectors


We consider for example the three numbers 2, −3, and π2 . Suppose that we wanted to
make repeated references to these numbers. We can choose to define a set whose elements
are these numbers. We take a different alternative here. We group them as below

2
−3
π
2
Usually, we bracket such a collection of numbers and prefer the notation
 
2
 −3 . (6.2.1)
π
2

Instead of square brackets, it is also common to use parentheses:


 
2
 −3 . (6.2.2)
π
2

These (6.2.1) or (6.2.2) represent a vector. We adopt square brackets as in (6.2.1), although,
on occasion, we will also use (6.2.2). Vectors are commonly represented by boldface lower
case letters or symbols like v:
 
2
v =  −3 . (6.2.3)
π
2

Other common notations are v or v , i.e., we underline the letter or we place an arrow on
top of the symbol representing the vector. In this case, we do not boldface the symbol.

Example 6.2.1 I Column vectors


When the elements are listed as in (6.2.1), or in (6.2.2), with the vector elements
stacked vertically, the vector is a column vector. We will see in Example 6.2.2
that we can organize the elements of the vector in a different way.
284 Vectors and Vector Calculus

The elements 2, −3, and π2 are the entries, elements, coordinates, or components
of the vector. They are ordered from top to bottom, so 2 is the first entry or
entry 1, while π2 is entry 3.

The entries of the column vector v in (6.2.1) can be thought of as the coordinates
of a point in a three dimensional Cartesian or Euclidean space R3 , and the
vector v becomes a point in R3 . With this interpretation of v, which has 3
entries, as an element of R3 (or of C3 , for that matter), we refer to the vector v
in (6.2.1), as a 3-dimensional or 3D vector. 

Example 6.2.2 I Constant vectors


In (6.2.1), the entries are constant and real valued. Another example is
 
1+j
 4 
b=
 ej π3 .
 (6.2.4)
22

This vector now has four entries, so it is a 4D-vector. In this vector, the entries
are complex numbers, so, we have
 
1+j
 4  4
 ej π3  ∈ C .
b= (6.2.5)

22

Example 6.2.3 I Vectors of unknown constants or variables


In the above examples, the entries of the vectors are fixed known numbers.
They can be arbitrary (unknown) constants or variables like
 
α
a= ∈ C2 . (6.2.6)
β

The 2D vector a has arbitrary, unspecified complex valued entries.

Sometimes, even the number of entries or dimension of the vector is unknown,


or taken to be a generic unspecified integer. In this case, it is common nota-
tion to represent the dimension by a letter, say k, n, or N . The vector is then
6.2 Vectors: Column and Row Vectors 285

represented by
 
a1
 a2 
a =  .. . (6.2.7)
 
 .
an

There are several important points to note in (6.2.7). First, the entries in (6.2.7)
are labeled by a symbol that is the unbolded version of the boldface letter rep-
resenting the vector; for example, entry 2 of the vector a is represented by a2 .
Second, the dimension n is a positive integer but left unspecified; we can read
it from the vector as the subindex n of the last entry an or by counting the
number of entries of the vector a. These are common and intuitive notations
that we will usually follow. If the entries are real valued unknown constants,
the vector a is a point in the Cartesian or Euclidean space Rn . We note this by
stating that a ∈ Rn and the vector a is n-dimensional or is a nD-vector. Fig-
ure 6.1 illustrates a 2D-vector graphically as a point on the plane, where the
first coordinate v1 is placed in the horizontal axis and v2 in the vertical axis.
 
v
v= 1
v2

Figure 6.1 Graphical representation of a vector.

Just like a complex number is a point on a plane, the complex plane, an n-


dimensional complex vector, i.e., a vector with n entries that are complex num-
bers, is a point in nD-complex space Cn , or it is
 
Re z1
 Im z1 
 
z =  ... ,
 
 
 Re zn 
Im zn

a point in 2nD real valued space R2n . 


286 Vectors and Vector Calculus

Example 6.2.4 I Vectors of functions


We consider vectors where the entries are the values taken by functions or dis-
crete sequences at particular points in time. In this example, we consider func-
tions of continuous time t. For example, at each time t, the vector of functions
takes the value
 
f (t)
 g(t) 
 
f (t) =  h(t) , t ∈ R+ . (6.2.8)
 
 p(t) 
q(t)

In this five dimensional vector, at time t, the entries are the values taken by the
functions in each entry of the vector evaluated at that (instantaneous) time t.
The range where the vector of functions is defined is indicated in (6.2.8) to
be R+ .

The functions may be real valued, complex valued, or may take values in some
other set. Note that we use the notation where the symbol indicating the vec-
tor is now also a function, with the indexing variable or independent variable t
taking values in some set; in (6.2.8), the independent variable t takes values in
the positive reals.

We consider an example.
 
cos 2πt
v(t) = , t ∈ [0 1].
sin 2πt

This vector is a vector of two functions; it is a 2D-vector. As t ∈ [0 1], we have


not just one vector but a whole family of vectors. For example, if t = 0, we get
 
1
v(0) = .
0
1
At t = 4

  " 1 #
1 √
v = √12 .
4 2

As t ∈ [0 1], this 2D-vector describes a circle of radius 1 that is traversed counter


clockwise, starting from the point (1, 0). 
6.2 Vectors: Column and Row Vectors 287

Remark 6.2.1 (Notation: Vector of functions). There is a certain level of ambiguity in the
notation. By f (t) we mean the vector f (t) at a particular (fixed value) t. However, when we
let t vary in its domain, say, t ∈ R+ , then f (t) in (6.2.8) really stands for a collection or
family of vectors each indexed by its value of t. We say the vector is a vector of functions, or
a function vector. Sometimes, to be more precise, we indicate this as (ft )t∈R+ or simply (ft ), if
the range is understood from the context. 

Example 6.2.5 I Vectors of sequences


Instead of continuous time, the entries of the vector may be the values of dis-
crete time sequences like in:
 
x1 [k]
 x2 [k] 
 x3 [k] , k ∈ Z.
x[k] =  (6.2.9)

x4 [k]

We again note that we used in (6.2.9) the common notation we referred to


above. The entries are represented by the unbolded face symbol of the bold-
face symbol representing the vector. The entries are also subscripted, so the
first entry of the vector x[k] is x1 [k].

A specific example is:

k 2 + 2k + 3
 
1 k
  
x[k] = 
 3 , k ∈ N.
2  
cos 2π16
k

At k = 0 this is the vector


 
3
1
x[0] = 
 2 ,

while at k = 1, we get
 
5
 1 
x[1] =  3 .
 2  
cos π8

288 Vectors and Vector Calculus

These examples provide quite a variety of vectors; the entries of a vector may be numbers,
unknown constants, variables, functions, or sequences; the entries may be integer valued,
real valued, or complex valued.

6.2.2 Row vectors


We now organize the entries of the vector horizontally, i.e., in a row. Consider the same
example as in (6.2.1), we then have
h πi
c= 2 −3 . (6.2.10)
2
Vector c is a row vector. This terminology distinguishes the vector in (6.2.10) from the
column vector (6.2.1), or (6.2.3). Like with column vectors, the entries of a row vector are
also referred to as components or coordinates of the vector.

Unless specifically stated, or it is understood otherwise from the context, vectors are com-
monly taken as column vectors.

All the comments made for column vectors can be repeated for row vectors. In particular,
the entries of the row vector (6.2.10) can be thought of as the coordinates of a point in the
Cartesian or Euclidean space R3 . The points in R3 are then (row) vectors.

6.2.3 Vector dimensions


A column vector of dimension n has n entries arranged vertically. We say that the column
vector has n rows and one column.

A row vector of dimension n has n entries arranged horizontally. We say that the row
vector has one row and n columns.

Sometimes, we indicate the dimension of a nD-column vector by n × 1. Likewise, we


indicate the dimensions of a nD-row vector by 1 × n.

The reasons for this will become clearer when we study matrices.

6.3 Calculus with Vectors


We can operate with scalars. We can equate, add, or multiply scalars, among other op-
erations. We now consider calculus with vectors. As a generic rule, with an important
exception, we operate with vectors by operating entrywise or component wise.
6.3 Calculus with Vectors 289

6.3.1 Equality
Two vectors are equal if their entries are equal. For example, equality among two vectors
as in
   
x1 1
 x2   0
 = 
 x3   α
x4 1+j

means that x1 = 1, x2 = 0, x3 = α, and x4 = 1 + j.

It is clear that equality of vectors being pointwise or entrywise implies that the two vec-
tors have to be of the same dimension. In the example above, both vectors are 4D-vectors.
It is interesting to note that while other entries, except the third, in the left vector are
known constants (because the corresponding entries on the right vector specify a real val-
ued number), entry 3 in the vector on the left is a variable since the corresponding entry
in the vector on the right is an unknown variable α.

If the entries of the two vectors are functions, then equality of the two vectors implies
that the functions of each entry are identical, i.e., equal for all values of the independent
variable where the functions are defined. For example,
   
x1 (t) f (t)
 x2 (t)  =  g(t) , t ∈ T,
x3 (t) h(t)

means that the entries of the 3D-vectors are identical as functions of time, x1 (t) ≡ f (t),
x2 (t) ≡ g(t), and x3 (t) ≡ h(t), i.e., they are equal ∀t ∈ T .

Remark 6.3.1 (Equality of row vectors). We have presented the equality of column vectors.
The equality of row vectors is defined like equality of column vectors as entrywise equality.
We can only consider equality of row vectors that have the same dimension. 

6.3.2 Vector Addition


Given two vectors of the same dimension x and y, their sum z is defined pointwise or
entrywise. For example
     
3 1.5 4.5
 5  +  62  =  67 .
−2 −π −2 − π
290 Vectors and Vector Calculus

Example 6.3.1 I Addition of vectors: Graphical representation


Consider the addition of the following two vectors,
   
v1 u1
v= , u=
v2 u2

The graphical representation of the addition u + v, is given in Figure 6.2.

Figure 6.2 Graphical representation of the addition of two vectors.

Example 6.3.2 I Addition of complex vectors


If the vectors have complex entries, their sum is as always defined entrywise,
but the sum of the entries is now the sum of two complex numbers.
     
1−j 1 2−j
+ =  .
52 + j cos π3 −2 − 3j 50 + j −3 + cos π3

Example 6.3.3 I Addition of vectors of functions


Likewise if the vectors f (t) and g(t) to be summed are vectors of functions,
6.3 Calculus with Vectors 291

then their sum h(t) is the vector whose entries are the entrywise sum of the
corresponding entries of each of the vectors. Let the vectors f (t) and g(t) be
nD-vectors. Their sum is:
   
f1 (t) g1 (t)
 f2 (t)   g2 (t) 
f (t) + g(t) =  ..  +  .. 
   
 .   . 
fn (t) gn (t)
 
f1 (t) + g1 (t)
 f2 (t) + g2 (t) 
= .
 
..
 . 
fn (t) + gn (t)
We can similarly define the sum of vectors of sequences:
   
f1 [k] g1 [k]
 f2 [k]   g2 [k] 
f [k] + g[k] =  ..  +  .. 
   
 .   . 
fn [k] gn [k]
 
f1 [k] + g1 [k]
 f2 [k] + g2 [k] 
= .
 
..
 . 
fn [k] + gn [k]


Remark 6.3.2 (Addition of row vectors). We have presented the sum of column vectors.
The sum of row vectors is defined likewise to the sum of column vectors as entrywise sum.
We can only consider the addition of row vectors that have the same dimension. 

6.3.2.1 Properties of vector addition


When it is well defined, i.e., the vectors have the same dimensions, the addition of vectors
has the following properties:
Associative Addition of vectors is associative:
v + u + w = (v + u) + w
= v + (u + w).
This property is inherited from the associativity of the addition of scalars, the entries
of the vectors.
292 Vectors and Vector Calculus

Commutativity Addition of vectors is commutative:

v + u = u + v.

Again, this follows from the commutativity of the addition of scalars.

Zero vector: Unit of addition There is a vector, the zero vector 0, that is the unit element
of the addition of vectors. Let the nD-vector:
 
0
 .. 
0 =  . .
0

The entries of 0 are all zero. Clearly

0 + v = v + 0 = v,

and 0 is the unit of addition.

Inverse with respect to addition and subtraction For every vector v, there is a vector u
such that the sum with the original matrix v is the zero vector. The vector u is the
negative of v, so that:

v + (−v) = 0.

6.3.3 Product of a vector by a scalar


The product of a vector v by a scalar α is the vector whose components are the compo-
nents of v multiplied by the scalar α. In other words, the product of a vector by a scalar
is the entrywise or pointwise multiplication of the vector entries by the scalar.

Example 6.3.4 I Product of vector by scalar


We consider an example:
   
3 3(1 − j)
 ej π6   (1 − j)ej π6 
(1 − j)
 −1  =  −(1 − j)
  

0 0
 −j π 
3e 4
 e−j 12π 
=
 ej 3π4 

0
6.3 Calculus with Vectors 293

Division by a nonzero scalar α is the same as the product by α1 , so from the


product by a scalar we can also divide by a nonzero scalar. This means, in
particular, that we can factor a scalar out from the entries of a vector. For
example:
   
2 1
2 1
 ..  = 2 .. ,
   
. .
2 1

since factoring out the scalar, the number 2 in the above example, is the same
as the product of the vector by the two scalars 2 and 21 , and then bringing the
scalar 21 inside the vector. 

Properties of product of a vector by a scalar The product of vectors by scalars enjoys


properties inherited from the corresponding properties for the product of scalars. We list
these.

Associativity The product of scalars with a vector is associative. It is associative with


respect to the product of the vector by several scalars:

αβv = (αβ)v.

Commutativity The product of a vector by a scalar is commutative:

αv = vα.

Distributivity The product of a scalar by a sum of vectors distributes, and the product of
a sum of scalars by a vector also distributes:

α(x1 + x2 + · · · + xM ) = αx1 + αx2 + · · · + αxN


(α1 + · · · + αm )x = α1 x + · · · + αm x.

6.3.4 Vector conjugation, transposition, and Hermitian


We consider three important operations on vectors, namely, vector conjugation, transpo-
sition, and Hermitian.

6.3.4.1 Vector conjugation


Complex conjugation of a vector is defined entrywise. The complex conjugate of a vector v
is the vector with the same dimension as v and whose entries are the complex conjugate
294 Vectors and Vector Calculus

of the entries of the original vector:


 ∗  
v1 v1∗
 v2   v∗ 
 2
 ..  =  .. .
 
 .   . 
vn vn∗

Note that complex conjugation of a vector v is indicated as usually by superindexing the


vector with the symbol ∗, i.e., by v∗ . In the Equation above, the dots represent unspecified
entries of the vectors.

As an example, we conjugate the following nD-vector:


∗  2π
−1 + ej (k N ) −1 + e−j (k N )
 
 −63   −63 
=
   
 ..   .. 
 .   . 
2 + 45j 2 − 45j

The vectors are of dimension n because we stated so before the example. The reader
should get used to these liberties and subtleties with the notation.

6.3.4.2 Vector transposition

We now consider vector transposition. The transpose of a column vector is a row vector
with the same dimension and the same entries as the column vector, where the first entry
(left most entry) of the row vector is the first entry of the column vector, the second entry
of the row vector is the second entry of the column vector and this continues, till the last
entry of the row vector is the last entry of the column vector.

The transpose of a row vector is the column vector whose entries are the same entries of
the original row vector now placed from top to bottom by the same order of the order of
the entries of the row vector, as we scan it from left to right.

Transposition is indicated by superindexing the vector with the letter T . For example, we
transpose the nD-column vector to obtain a nD-row vector:
 T
2
 −5 
 ..  = [2 − 5 · · · 31]
 
 .
31
6.3 Calculus with Vectors 295

Likewise, an example of transposing a row vector is given by:


 
f1 (t)
 f2 (t) 
[f1 (t) f2 (t) · · · fm (t)]T =  .. .
 
 . 
fm (t)

This example transposes a row vector of dimension m to obtain a column vector of the
same dimension m.

In other words, transposition of a column vector of dimension n is a row vector of di-


mension n, said differently, transposition of a n × 1 vector, gives a 1 × n vector, i.e., a row
vector of dimension n.

Similarly, transposition of a row vector of dimension n is a column vector of dimension n,


or, transposition of a 1 × n vector, gives a n × 1 vector, i.e., a column vector of dimension n.

6.3.4.3 Vector Hermitian


The Hermitian of a vector v is the complex conjugate transpose of the vector; it is indi-
cated by vH . We have, combining the two previous operations of conjugation and trans-
position:

vH = (v∗ )T
∗
= vT .

This says that the order by which conjugation and transposition are taken is immaterial.

If the vector is real valued, then

vH = vT .

If the vector is a scalar, i.e., a 1 × 1 vector,

vH = v∗;

the Hermitian of a scalar is the complex conjugate of the scalar.

As an example, we compute the Hermitian of the nD-column vector to obtain a nD-row


vector:
 H
12 − j
 −23 + j  π
 = [12 + j − 23 − j · · · 37e−j 5 ]
 
 ..
 . 
j π5
37e
296 Vectors and Vector Calculus

Similarly, the Hermitian of a row vector is a column vector. For example, for a sequence
`D-vector, i.e., a vector of ` sequences:
 
x1 [k]∗
 x2 [k]∗ 
[x1 [k] x2 [k] · · · x` [k]]H =  .. .
 
 . 
x` [k]∗

We conclude that the Hermitian of a column vector of dimension ` is a row vector of di-
mension `; and, similarly, the Hermitian of a row vector of dimension ` is a column vector
of dimension `.

6.3.5 Linear combination


We can use the operations of addition of vectors and product of a vector by a scalar to
generate new vectors. In particular, given n vectors v1 , · · · , vn , we can form a new vector v
that is the linear combination (l.c.) of these vectors as:

v = α1 v1 + · · · + αn vn , (6.3.1)

where α1 , · · · , αn are scalars, e.g., integers, real valued, or complex valued. These coeffi-
cients can be zero, in which case the l.c. is zero. The number of vectors can also be zero,
in which case, again, the vector v is zero.

It is straightforward to verify that the linear combination of vectors is in fact linear, since the
linear combination of two vectors, where each vector is given by a linear combination of
vectors {v1 , · · · , vn } and {w1 , · · · , wm }, respectively, is itself a linear combination of the
vectors {v1 , · · · , vn , w1 , · · · , wm }. Consider the two vectors:

v = α1 v1 + · · · + αn vn (6.3.2)
w = β1 w1 + · · · + βm wm , (6.3.3)

and their linear combination:

γ1 v + γ2 w = γ1 (α1 v1 + · · · + αn vn ) + γ2 (β1 w1 + · · · + βm wm )
= γ1 α1 v1 + · · · + γ1 αn vn + γ2 β1 w1 + · · · + γ2 βm wm .

We go from the first equality to the second equality by distributing the product of the
scalars γ1 and γ2 with respect to the sum of vectors in parenthesis. The resulting vector is
clearly a linear combination of the vectors {v1 , · · · , vn , w1 , · · · , wm }.

We will give a different interpretation of a linear combination of vectors in Example 6.5.13,


after we introduce the product of a matrix by a vector.
6.3 Calculus with Vectors 297

Convex linear combination A particular linear combination of interest in many appli-


cations is the convex linear combination. A linear combination like (6.3.1) is convex if the
constants of the linear combination are all in the interval [0 1] and they add up to 1:

n
X
αi = 1.
i=1

Example 6.3.5 I Linear combination: standard coordinate vectors


Consider nD-vectors and define the nD-coordinate vectors ei :
       
1 0 0 0
0 1 0 0
       
0 0 0 0
e1 =  .. , e2 =  .. , · · · , en−1 =  .. , · · · en =  .. .
       
. . . .
       
0 0 1 0
0 0 0 1

We consider the following linear combination of the 5D coordinate vectors ei


           
1 0 0 0 0 31
0 1 0 0  0   −j 
 − j  0  + 42 1  + 52 0  − ej π3  0  =  42
           
31 0
          
.

0 0 0 1  0   52 
π
0 0 0 0 1 −ej 3

This illustrates two facts. That we can express a more complicated vector like
 
31
 −j 
 
 42
v= 

 52 
π
−ej 3

as linear combination of other vectors, which may be simpler to describe like


the coordinate vectors ei , or we may generate more complicated vectors like v
from simpler vectors like the coordinate vectors ei . This divide and conquer
strategy (express complicated objects using simpler components, or use simple
components to build more complicated objects) is the crux of many engineer-
ing problems (build a computer out of a billion transistors, or build a house
from bricks). Linear Algebra has much to do with this as we will see. 
298 Vectors and Vector Calculus

6.3.6 Limits, derivatives, integration, delay, and Taylor series


We can perform more sophisticated operations with vectors. We consider several of these
here, illustrating that these operations are, again, defined entrywise. This holds for both
column vectors and row vectors. We provide examples to illustrate these concepts.

6.3.6.1 Limit

Consider the nD-vector (of functions) x(t), t ∈ T ⊂ R. Let t0 ∈ T . The limit of the vector
of functions is given by
 
x1 (t)
 x2 (t) 
lim x(t) = lim  .. 
 
t→t0 t→t0  . 
xn (t)
 
limt→t0 x1 (t)
 limt→t x2 (t) 
0
= .
 
..
 . 
limt→t0 xn (t)

The limit of the vector of functions is the vector of the limits of the functions in each entry
of the vector. To prove this result, we need to introduce concepts that we do not have yet
available like distance between vectors that formalize the notion of vectors being close
to each other. Instead, we take a pragmatic view and simply take this as the definition
of limit with vectors. On the other hand, this is intuitive, since we can expect that, if
the limit of the sequence of vectors is defined entrywise, as the entries of the vectors in
the sequence get closer and closer, we expect the vectors themselves (say, as points in a
Cartesian space) to get closer and closer.

Example 6.3.6 I Vector limits


We provide an example with the row vector x(t):

2t2 e−t
 
lim x(t) = lim sin(2πt) t−1
t→1
ht→1 i
= lim sin(2πt) lim(t − 1) lim 2t2 lim e−t
t→1 t→1 t→1 t→1
e−1
 
= 0 0 2


6.3 Calculus with Vectors 299

6.3.6.2 Derivative of a vector of functions


Let x(t) be a vector whose entries are differentiable functions. The derivative of the vector
x(t) is the vector of the derivatives of the entries of the vector:
 
x1 (t)
dx(t) d x2 (t) 

=  .. 
dt dt  . 
xn (t)
 dx1 (t) 
dt
 dx2 (t) 
dt
= .
 
 ..
 . 
dxn (t)
dt

Example 6.3.7 I Vector derivatives


We work out an example:
 
cos(ω0 t)
dx(t) d
=  3t2 
dt dt
ej2πt
 d cos(ω0 t) 
dt
 
 
d( ) 
3t2
=

 
 dt 
 
dej2πt
dt
 
ω0 sin(ω0 t)
= 6t 
j2πt
j2πe


6.3.6.3 Integral of a vector of functions


Let x(t) be a vector whose entries are integrable functions. The integral of the vector x(t)
is the vector of integrals of the entries of the vector:
 
x1 (t)
Z tf Z tf 
 x2 (t) 

x(t)dt =  ..  dt
ti ti  . 
xn (t)
300 Vectors and Vector Calculus

Rt 
f
x1 (t) dt
 Rttif
x2 (t) dt 

 ti
=  .. .
.
 
R 
tf
ti
xn (t) dt

Example 6.3.8 I Vector integrals


We consider the example:
 
Z 1 Z 1 cos(2πt)
x(t)dt =  −3t  dt
0 0 1
1+t2
R1 
cos(2πt) dt
 0R 1
=  0 (−3t) dt 

R1 1
0 1+t2
dt
− 2π [sin(2πt)]|10
 1 
1
=  − 33 t3 0 dt 
arctan t|10 dt
 
0
=  −1 .
π
4

6.3.6.4 Advance and delay


The advance and the delay of a vector of sequences is defined entrywise.

Example 6.3.9 I Vector advance


For example, let:
 
ρk
x[k] = 2π
 .
cos N
k

Then, the advance x[k + 1] is computed entrywise as


 
ρk+1
x[k + 1] = .
cos 2π

N
(k + 1)

The delayed x[k − 1] is computed similarly. 


6.3 Calculus with Vectors 301

6.3.6.5 Taylor series


We can define the Taylor series of a vector of functions, again, entrywise. But there is a
caveat as we will see. Before doing that, we introduce the summation vector.

Remark 6.3.3 (Word on notation: Summation). The symbol Σ denotes summation. It


abbreviates the sum of several elements. For vectors, it stands for:
N
X
v n = v 0 + v1 + · · · + v N (6.3.4)
n=0

The limits of the summation in (6.3.4) can be arbitrary and define the range over which the
terms are summed; the example sums the vectors vn from n = 0 to some arbitrary value N . 

We now consider Taylor series of a vector of functions. The concept is best illustrated by
an example.

Example 6.3.10 I Taylor series


Consider the vector of two functions:
1
 
v= 1−ρ .
ρ
e
The entries are a function of the complex number ρ. The Taylor series is defined
entrywise. The Taylor series of each entry is:

1 X
= ρn , |ρ| < 1
1 − ρ n=0


X ρn
eρ = , ∀ρ ∈ C.
n=0
n!
1
The Taylor series of 1−ρ is valid for |ρ| < 1, while the Taylor series of eρ is
valid for any value of ρ in the complex plane since eρ is an entire function, see
Definition 3.5.3 in Chapter 3. When we consider the two functions as entries
of the same vector v, we restrict the domain of validity to the intersection of
the domains in which the Taylor series of each entry is valid. For the above
example, the two domains are |ρ| < 1 and C. This intersection is
{ρ ∈ C : |ρ| < 1} ∩ C = {ρ ∈ C : |ρ| < 1},
302 Vectors and Vector Calculus

abbreviated by |ρ| < 1. So, the Taylor series of the vector v is given by:
 P∞
ρn

n=0
v(ρ) =  P , |ρ| < 1
∞ ρn
n=0n n!

X ρ
=  , |ρ| < 1
ρn
n=0
n!

X
= vn (ρ), |ρ| < 1.
n=0

Remark 6.3.4 (Product of vectors). Note that we did not defined the product of two vectors.
This is very different from what we are used to with the other operations we considered in this
Section. We will take it when we address the product of matrices in the next Section. 

6.4 Matrices
In Section 6.2, we introduced vectors. We now study matrices, building on what we
learned about vectors. We start by motivating our study of matrices by revisiting the
system of three linear algebraic equations in three unknowns given at the beginning of
the Chapter by (6.1.1). We repeat it herein for easy reference.

4x1 + 6x2 + 9x3 = 6


6x1 − 2x3 = 20
5x1 − 8x2 + x3 = 10

As we mentioned in the introduction to the Chapter a group of equations like these is re-
ferred to as a linear systems of algebraic equations, or simply a linear system of equations
where the unknown quantities are the variables x1 , x2 , and x3 .

To write this system of three equations compactly, we organize the coefficients of the
unknown variables x1 , x2 , x3 in an array or rectangle form as follows:

4 6 9
6 −2
5 −8 1
6.4 Matrices 303

The missing entry in this array can be taken to be 0, i.e., in the second equation we can
introduce

0 · x2

So our array of numbers is now

4 6 9
6 0 −2
5 −8 1

To make sure no entries are lost, we customarily use square brackets to delineate the
rectangular form as follows
 
4 6 9
 6 0 −2  (6.4.1)
5 −8 1

Sometimes, parentheses are used instead:


 
4 6 9
 6 0 −2 
5 −8 1

We will usually stick with square brackets. The array enclosed by square brackets in
Equation (6.4.1) is an example of a matrix. Matrices will be represented by capital boldface
letters:
 
4 6 9
A =  6 0 −2 . (6.4.2)
5 −8 1

Returning to the system of three equations in three unknowns, the three unknowns and
the three known terms are collected also in vectors
   
x1 6
x = x2  and b = 20.
x3 10

The compact notation for the system of three equations in three unknowns is then:

Ax = b.

To make sense of this compact notation, we need to understand what A and Ax stand for.
The equality is the equality of vectors, since the Right-Hand-Side is a vector. This Section
considers the matrix A.
304 Vectors and Vector Calculus

We identify important quantities related to matrices from this matrix (6.4.2). We work
with the matrix A in (6.4.2). The nine elements arranged neatly in the matrix A are the
elements or entries of A. If we scan the entries of A in lexicographic order, starting at
the top left corner, and from left to right, the first three elements 4, 6, and 9 make up
the first horizontal line and are the first row of the matrix A. The three elements below 6,
0, −2 are the second row, and, finally, the three last elements 5, −8, and 1 are the third row.

We can scan the matrix A in different order, going down from the top left element, we
read the elements 4, 6, 5. These three elements organized vertically are the first column
of the matrix A. The three elements to the right of this first column, 6, 0, and −8 are the
second column. Finally, the last three elements to the right of the second column, 9, −2,
and 1 are the third column of the matrix.

We now introduce formally matrices.

6.4.1 Representations
In this Subsection, we will introduce the dimensions of a matrix and discuss several dif-
ferent ways of defining and describing matrices. They are useful in different settings; it
is important to realize that the same object can have several different equivalent descrip-
tions.

6.4.1.1 Dimension
Matrices can be large or small. The smallest one (in terms of dimensions) is the scalar.
Sometimes it pays to look at scalars as matrices. As a matrix, the scalar α could be written
as:

[α]. (6.4.3)

We seldom write a scalar as in (6.4.3), but here it is useful just to note that the scalar is a
matrix with a single row and a single column. We say a scalar is a 1 × 1 matrix, or has
dimensions 1 × 1, or has dimension one.

Beyond scalars, Section 6.2 introduced vectors. We saw in Section 6.2 that nD-vectors can
be of two types – row vectors and column vectors. As matrices, nD-row vectors have
dimension 1 × n, i.e., they are a matrix with a single row and n columns. Likewise, mD-
column vectors are matrices with dimensions m × 1, i.e., matrices with m rows and a
single column.

So, matrices of dimensions 1 × 1, 1 × n, or m × 1 are of course objects that we are already


familiar with. We want to deal with more general objects like the matrix A in (6.4.2).
6.4 Matrices 305

More generally, if the matrix A has m rows and n columns, the dimensions of A are:

A:m×n

The dimensions of A are read “m times n.” We emphasize that the first number m indi-
cates the number of rows in the matrix and the second number n indicates the number of
columns of the matrix. The matrix A in (6.4.2) has dimensions 3 × 3 with m = 3 rows and
n = 3 columns.

The next three Subsections consider three alternative representations of matrices. We start
with the scalar representation, the most common one and the easiest one since it explicitly
shows the array format with which we introduced matrices.

6.4.1.2 Scalar representation


We now consider a generic matrix with m rows and n columns, i.e., a m × n matrix A. It
can be explicitly written as follows:

a11 a12 ··· · · · a1n


 
 a21 a22 ··· · · · a2n 
 .. .. .. .. .. 
. . . . . 
 
A= (6.4.4)

··· ··· aij ··· ··· 


 .. .. .. .. .. 
 . . . . . 
am1 am2 · · · · · · amn

The entries of the matrix are scalars. We will refer to (6.4.4), when needed, as the scalar
representation of the matrix A.

In the m × n matrix A in (6.4.4) there are mn entries. The vertical and horizontal dots
represent unspecified elements, rows, and columns of the matrix A.

The elements of the matrix are indexed by two indices. For example, the second entry be-
low the top left entry, the entry, a21 , is indexed by the two indices 2 and 1. The first index,
2, indicates the index of the row, in this case, the second row. Unless otherwise stated,
the rows are labeled increasingly from top to bottom. The second index, 1, indicates the
index of the column, in this case, the first column; unless otherwise stated, the columns
are labeled in increasing order from left to right. Usually, we start the indexing of rows
and columns with the number 1. On occasion, it is more suggestive to start the indexing
of both rows and columns from 0.

The generic entry aij is the entry at the crossing of row i and column j; again, the sub-
scripts i and j in the generic element aij of the matrix A denote, respectively, the row
and column position of the entry aij –they are often referred to as the row index and the
306 Vectors and Vector Calculus

column index of the element aij .

In (6.4.4), the matrix A is written explicitly by listing exhaustively all the elements of the
matrix. We can write it more compactly once we recognize the generic element aij of the
matrix as:
A = [aij ] 1≤i≤m, 1≤j≤n
or simply
A = [aij ],
where the dimensions m × n of the matrix A are assumed known.

Matrix values As for vectors, the entries aij of a matrix A may be in Z, Q, R, C, or may
be generic variables, known or unknown, functions, or sequences.

6.4.1.3 Column vector representation


We consider the column vector representation for a matrix. We illustrate first with an
example.

Example 6.4.1 I Column vector representation: 3 × 3 matrix


Consider the matrix A in (6.4.2). We start by noting the columns of A, see
Figure 6.3.

Figure 6.3 Identify the columns of a matrix.

Each of the columns can of course be identified as a column vector. For our
example, we have the three column vectors:
     
4 6 9
a1 =  6 , a2 =  0 , a3 =  −2 
5 −8 1
6.4 Matrices 307

We can use these vectors to write the matrix A more compactly as:
 
A = a1 a2 a3
The column vectors in A, namely, a1 , a2 , and a3 have dimension 3 × 1 in this
example, or simply 3. 

General case: Column vector representation.

We now consider the general column vector representation of a matrix. Let the matrix A
with (scalar) representation and dimensions be as follows,
A = [aij ] : mA × nA (6.4.5)
It is important to realize what the dimensions of the matrix tell about its structure, as
we now discuss. For example, from the dimensions of the matrix we can realize that the
matrix has nA column vectors and each column vector is of dimension mA . The column
vectors of the matrix A are:
     
a11 a1j a1nA
 a21   a2j   a2n 
A 
a1 =  .. , · · · , aj =  .. , · · · , anA =  .. 
    
 .   .   . 
amA 1 amA j amA nA
We can then get the column representation of the matrix A as:
A = [a1 · · · aj · · · anA ] (6.4.6)

6.4.1.4 Row vector representation


There is nothing special about writing the matrix A in terms of its columns. We discuss
here the row representation of the matrix A. We start with an example.

Example 6.4.2 I Row representation: 3 × 3 matrix


We consider the 3 × 3 matrix in Figure 6.4. The Figure identifies the rows by
circling them.

Figure 6.4 Identify the rows of a matrix.


308 Vectors and Vector Calculus

The three rows are:

bT1 = 4 6 9
 

bT2 = 6 0 −2
 

bT3 = 9 −8 1
 

The row vectors in A have the same dimensions, namely, 1 × 3.

We can now write the matrix A more compactly using its rows. We get for our
example:
 T
b1
A = bT2 

bT3

Note that the row vectors are stacked one below the other, not side by side like
in the column representation. We also know that each row vector is dimen-
sion 3 since the matrix has 3 columns. 

General case: Row vector representation.

We now consider the general case. The matrix A is given in Equation (6.4.5) and we
assume it is real valued. It is of dimensions mA × nA . From this we can conclude it has
T
mA rows, each row of dimension nA . Let the rows of A be f1T , · · · , fm A
:

f1T = [a11 · · · a1nA ] (6.4.7)


..
. (6.4.8)
fiT = [ai1 · · · ainA ] (6.4.9)
..
. (6.4.10)
T
fm A
= [amA 1 · · · amA nA ] (6.4.11)

We can then write the row representation of A as:

f1T
 
 .. 
 . 
A =  fiT  (6.4.12)
 
 . 
 .. 
T
fm A
6.4 Matrices 309

Remark 6.4.1 (Row vector representation of complex matrices). In (6.4.7) the row vec-
tors of A are represented as the transpose of column vectors fi , so that the row vector repre-
sentation of A is expressed in terms of fiT . When the matrix is complex valued, it is more
common to define the rows of A as the Hermitian of column vectors. Then, the row represen-
tation of A is usually written as:

f1H
 
 .. 
 . 
A =  fiH . (6.4.13)
 
 . 
 .. 
H
fm A

Unless otherwise specified, we will consider that the matrices are real valued and work with
the row representation in (6.4.12) rather than in (6.4.13). However, whenever the matrix is
complex valued, we should represent the row representation by (6.4.13). 

6.4.1.5 Block matrix representation


In many applications, matrices are given in more compact form through blocks, this is the
block representation of the matrix. We look at an example.

Example 6.4.3 I Block matrix


We consider the 3 × 3 matrix A given by
   
12 5
 34 6 
A= 


   
78 9

We recognize in the matrix A four submatrices identified by square brackets


in A:
   
12 5
; aT3 = 7 8 ; a33 = 9
 
A1 = ; a2 =
34 6

Note that we refer to the blocks using our common notation: matrices are cap-
ital bold faced roman letters; column vectors are lower case bold faced roman
letters; and scalars are lower case letters. We also chose to represent the third
block, a row vector, as the transpose of a column vector. These notations are a
matter of choice; we use them simply for consistency.
310 Vectors and Vector Calculus

The matrix A can be expressed in terms of these blocks as:


 
A1 a2
A1 =
aT3 a33
We note that, to write the matrix A in block form, the matrix blocks (or sub-
blocks as we sometimes also refer to them) need to have consistent dimensions:
block A1 is 2 × 2; because A1 has two rows, the contiguous block to the right,
a2 , which is a column vector, has to be a column vector of dimension 2, i.e.,
has dimension 2 × 1; likewise, A1 has two columns, so the block below A1 , aT3 ,
which is a row vector, has to be a row vector of dimension 2, i.e., has dimension
1 × 2; finally, the last block a33 has to be a scalar since the block above is a
column vector, hence a single column, and the block to the left is a row vector,
hence a single row, i.e., has dimension 1 × 1. 

General case: block matrix representation We consider a general block matrix form by ex-
tending in a straightforward way Example 6.4.3. The matrix A is in block form:
 
A11 A12 · · · A1`
A =  ... .. .. ..  (6.4.14)

. . . 
Ak1 Ak2 · · · Ak`

The matrix has k` blocks and these blocks have to have consistent dimensions: the blocks
in block row i all have to have the same number of rows, say mi ; and the blocks in block
column j all have to have the same number of columns, say nj . So, we have that the block
entry Aij has the following dimensions:
Aij : mi × nj .

6.4.2 Examples
We consider a few examples of matrices. The zero and identity matrices are two examples
of very important matrices in applications. We will also discuss the Fourier matrix as
an example of a matrix with complex entries, rotation matrices, matrices of functions
whose entries are functions of a variable, for example, time, and polynomial and rational
matrices.

Example 6.4.4 I Zero matrix


The m × n zero matrix 0 is a m × n matrix whose entries are all zero.
 
0 0 ··· 0
0 =  ... ... ... ... .
 
0 0 ··· 0
6.4 Matrices 311

Example 6.4.5 I Identity matrix


The identity matrix is a matrix with the same number of rows and columns,
i.e., has dimensions n × n, or simply said, it is of dimension n; it is usually
represented by I, or, to make the dimension explicit it is represented by In . It is
a matrix with generic element defined by:

Iij = δij , 1 ≤ i, j ≤ n,

where the symbol δij is the Kronecker symbol that is equal to one if i = j and
is equal to zero if i 6= j. In words, the elements of the identity matrix I are zero,
except the n elements with the same row and column indices, in which case
the element is one.

For example, the identity matrix with 4 rows and 4 columns, i.e., the 4 × 4
identity matrix I4 , is given by:
 
1 0 0 0
0 1 0 0
I4 = 
0
.
0 1 0
0 0 0 1

Example 6.4.6 I Discrete Fourier matrix


The discrete Fourier matrix is a matrix with the same number of rows and
columns, i.e., it is n × n; it is represented by F, or, to indicate explicitly its
dimension n, by Fn . Its entries are complex valued. The Fourier matrix is a
very important matrix in Mathematics, Physics, and in many application areas,
namely, Signal Processing, more specifically, Digital Signal Processing (DSP).
One could say without much hyperbola that the Fourier matrix and its fast as-
sociated “fast algorithm,” the Fast Fourier Transform, the FFT for short, are the
bread and butter of DSP and, again, without hyperbola, one of the, if not ‘the,’
algorithm behind the digital revolution–behind every smart phone, or digital
display, lurks an FFT.

The discrete Fourier matrix of dimension n is defined by:

1 h 2π i
Fn = √ e−j n k` , k, ` = 0, · · · , n − 1. (6.4.15)
n
312 Vectors and Vector Calculus

This expression shows that the generic element k` of Fn is the complex expo-
nential
1 2π
Fk` = √ e−j n k` ,
n

where 0 ≤ k, ` ≤ (n − 1).

The quantities


Ω` = ± `, ` = 0, · · · , n − 1
n

are called the discrete frequencies.

Note that the factor √1 is for normalization purposes; not all authors include it.
n

When the dimension is n = 2, we obtain the so called base case F2 :

1 h 2π i
F2 = √ e−j 2 k` , k, l = 0, 1
2
 
1 1 1
=√ −jπ
2 1e
 
1 1 1
=√ .
2 1 −1

We write explicitly the 4-dimensional discrete Fourier matrix F4 :

1 h 2π i
F4 = √ e−j 4 k` , k, l = 0, · · · , 3
4
1 1 1 1
 
−j 2π −j2 2π −j3 2π
1 1 e
 4 e 4 e 4 
=  2π 2π 2π

2  1 e−j2 4 e−j4 4 e−j6 4 
2π 2π 2π
1 e−j3 4 e−j6 4 e−j9 4
 
1 1 1 1
1  1 −j −1 j 
=  .
2  1 −1 1 −1 
1 j −1 −j

Example 6.4.7 I Rotation matrix


6.4 Matrices 313

An important parametric matrix, i.e., where the entries are in terms of a pa-
rameter θ, is the rotation matrix on the plane. This is a 2 × 2 matrix R:
 
cos θ − sin θ
R(θ) = .
sin θ cos θ

This matrix is important in applications because, as we will see later, it rotates


vectors on the plane by the angle θ.

π
For example, the matrix that rotates counterclockwise a vector by 3
is:
" √ #
π  1 3

R = √23 1 2 .
3 2 2

Example 6.4.8 I Reflection matrix


Another important parametric matrix is the reflection matrix on the plane. This
is a 2 × 2 matrix S, very similar to the rotation matrix:
 
cos θ sin θ
S(θ) = .
sin θ − cos θ

This matrix reflects vectors on the plane about the line at angle θ. 

Example 6.4.9 I Function matrix


We will have occasion of working with matrices whose entries are functions or
sequences when we study higher order differential and difference equations.
Just as an illustration, we give an example of a 2 × 2 matrix whose entries are
functions.

1 e−2t + e−t −2e−2t + 3e−t


 
Φ(t) = , t ≥ 0.
2 4e−2t + 3e−t e−2t + 5e−t


Example 6.4.10 I Polynomial and rational matrices


In digital signal processing, or in controls, one often encounters matrices that
represent in the z-domain, i.e., in the complex domain, the behavior of a linear
discrete time system. These are the so called system functions or system transfer
314 Vectors and Vector Calculus

functions, see also Chapter 5, Section 5.6, on system functions and transfer func-
tions. These are matrices whose entries are polynomial or rational functions,
see Chapter 2 on rational functions. An example, of a polynomial matrix in the
complex variable z is:
 
z − 1 z 3 − 2z 2 + z − 1 z 2 − 2z + 3
H(z) = 2 , z ∈ C.
z − 1 (z 2 − 5)(z 3 − 1) z 2 − 2z + 11
An example of a rational matrix in the complex variable z is the following 2 × 3
matrix:
" 2 2
#
z z +1 z −2z+3
H(z) = z−1 z 3 −2z 2 +z−1 z 3 +3z−2 , z ∈ D ⊂ C, (6.4.16)
2 2
z −5 z+3
z 2 −1 z 3 −1 z 2 −2z+11

where D is a domain, see Definition 3.2.8 in Chapter 3 for the definition of a


domain. 

Remark 6.4.2 (Region of convergence). In matrix (6.4.16), the entries are functions in the
complex variable z. In fact, and on purpose, we chose these functions to be rational functions
of z. Note that to define the matrix H(z) we need not only the expression as given above but
to define a domain D ⊂ C where it converges. This domain for matrices of rational functions
are usually insides of circles, or outsides of circles, or disks (regions between circles). Or, in
alternative, a half-left plane (the plane to the left of a vertical axis on the complex-plane), or
a half-right plane (the region of the plane to the right of a vertical axis), or the region in the
plane between two vertical axes. This we will not address here since it would take us too far
afield. We just state the warning. This topic is discussed in detail in areas like Controls or
Signal Processing. 

We consider matrices that have special structure.

Example 6.4.11 I Square matrices


When the number of rows m of a matrix A equals the number of columns n,
m = n, the matrix is a square matrix as, for example:
 
123
A =  4 5 8 .
789
A general square matrix in scalar representation looks then like:
 
a11 · · · a1n
A =  ... ... ... ,
 
an1 · · · ann
6.4 Matrices 315

where A is n × n.

When a square matrix A is of dimension n × n, we say simply that A is of di-


mension n or is n-dimensional.

Square matrix structure. When working with a n-dimension square matrix it is


important to recognize certain structural elements of the square matrix. The
diagonal entries of a matrix are the elements aii with equal row- and column-
indices. The collection of the diagonal elements is the matrix diagonal, main
diagonal, or principle diagonal of the matrix. Often, we simply say the diagonal.
The collection of elements ai(i+1) as i = 1 to i = n − 1 is the first upper diagonal.
Likewise, the collection of elements a(i−1)i as i = 2 to i = n is the first lower di-
agonal. We can define similarly the second and higher ordinal upper or lower
diagonals. In a square matrix the counter diagonal is the collection of elements
ai(n−(i−1)) .

The upper triangular part of a n-dimension square matrix are the elements
above the main diagonal of the matrix. The lower triangular part of a n-
dimension square matrix are the elements below the main diagonal.

In previous examples, we saw examples of square matrices like the diagonal


matrix, the Fourier matrix, and the rotation matrix, among others. 

Example 6.4.12 I Rectangular matrices


When m 6= n, the matrix is a rectangular matrix, as in the following examples:
 
123 7 8
A =  4 5 8 10 15 ,
7 8 9 −3 −6.5
or
 
1 2 3
 4 5 8
 
 7 8 9 
B=
 −1 0
,
 2
 23 2.5 −5.3 
−6 7 253
where the dimensions of A are 3 × 5 and of B are 6 × 3.

In many applications, matrices are not only rectangular, but one of the dimen-
sions is much larger than the other dimension. The terminology for such ma-
trices is suggestive. If m  n, the matrix A is called a tall rectangular matrix,
316 Vectors and Vector Calculus

illustrated as follows.
 
 
 
 
 
A = m × n


 
 
 

If m  n, the matrix A is a fat rectangular matrix, illustrated as follows.


 

A= m×n 

Example 6.4.13 I Diagonal matrices


A diagonal matrix is a square matrix where:
D: n×n
such that

dii if i = j
dij = .
0 if i 6= j
In other words, all elements are zero except possibly the diagonal elements dii :
d11 0 · · · · · · 0
 
 0 d22 0 · · · 0 
 . .
 . . . . . .. .. 

D= . . . .  (6.4.17)
 . . . .
 .. .. .. . . ... 

0 0 · · · 0 dnn
We can write it in more compact notation as:
D = diag[d11 · · · dnn ]. (6.4.18)
The writing in (6.4.18) is shorthand for (6.4.17).

Given a vector d:
 
d1
d =  ... 
 
dn
6.4 Matrices 317

the diagonal matrix of the vector is the diagonal matrix whose diagonal entries
are the elements of the vector d; it is represented as
d1
 
 d2 0 
.
 
 .. 
D = diag(d) = 
 
 . ..


 
 0 dn−1 
dn
The symbol 0 above the diagonal means that all entries above the diagonal are
zero. The same applies for the symbol 0 below the diagonal. 

Example 6.4.14 I Identity matrix as diagonal matrix


The identity matrix, see Section 6.4.2, is a square matrix with a very special
structure. It is a special case of a diagonal matrix. We can write it using the
diagonal matrix notation as:
I = diag(1),
where
 
1
1
 
1 =  ... 
 
 
1
1
is the vector of ones. 

Example 6.4.15 I Triangular matrices


There are two types of triangular matrices: upper triangular U and lower trian-
gular L. An upper triangular matrix has all elements below the main diagonal
zero; and a lower triangular matrix has all elements above the main diagonal
as zero.

An upper triangular matrix then looks like:


 
u11 u12 · · · · · · u1n
. . .. 
u22 .. .. . 


U=
 .. 
 . ∗ 

 .. .. 
 0 . . 
unn
318 Vectors and Vector Calculus

The ∗ in the upper triangular part of the matrix means that we do not care
about the actual values of the entries, they may be arbitrary. The 0 in the lower
triangular part means that all entries below the diagonal are zero.

A lower triangular matrix is given by:


`11
 
 `21 `22 0 
 .
...

 .
L= . ?


 . . . .
 .. .. .. ..


`n1 · · · · · · `n(n−1) `nn
A few elementary examples. Of course a zero square matrix or a diagonal
matrix are both lower and upper triangular matrices. Non-trivial examples are
given here:
   
100 123
L = 2 3 0 and U = 0 4 5
456 006


6.5 Calculus with Matrices


We now consider how to operate with matrices. Matrices extend vectors, so, we consider
equality, addition, multiplication of matrices by scalars. But we also consider multiplica-
tion of matrices. We will see that, for matrices to be equal, or to add matrices, or multiply
matrices we need to pay special attention to the dimensions of the matrices.

To fix notation, we consider the matrices A and B with the scalar representations and
dimensions as follows,
A = [aij ] : mA × nA (6.5.1)
B = [bij ] : mB × nB (6.5.2)
We also write their decompositions in columns, rows, and blocks. The column represen-
tations of A and B are:
A = [a1 · · · anA ] (6.5.3)
B = [b1 · · · bnB ] (6.5.4)
The nA columns a1 , · · · , anA of the matrix A in its column representation (6.5.3) are of
course column vectors of dimension mA . Similarly for B.
6.5 Calculus with Matrices 319

The row representations of A and B are:


 
f1T
A =  ...  (6.5.5)
 
T
fm
 A 
g1T
B =  ...  (6.5.6)
 
T
gm B

The mA rows of matrix A are f1T , · · · , fm


T
A
and they are of course row vectors of dimension
nA . We have a similar comment regarding matrix B.

You should get used to figure out from the context if a vector is a column vector or a row
vector.

Finally, the block representations of A and B are assumed to be:


 
A11 A12 · · · A1`
A =  ... .. .. ..  (6.5.7)

. . . 
Ak1 Ak · · · Ak`
 
B11 B12 · · · B1n
 .. .. .. .. 
B= . . . .  (6.5.8)
Bm1 Bm2 · · · Bmn
We assume that the blocks in each matrix A and B are consistent and the corresponding
blocks in A and B have the same dimensions.

We now consider several operations we can perform with matrices. We will express them
using the scalar entries representation of a matrix, as well as the column and the row
representations of the matrix.

6.5.1 Equality
We investigate what equality of matrices means:
A=B
Matrix equality is interpreted as entrywise equality. Therefore, matrix equality requires
the same number of columns and number of rows, as well as the same number of entries
in each matrix:
mA = mB = m
nA = nB = n.
320 Vectors and Vector Calculus

Given that the matrices have the same dimensions, the number of entries is the same
mA nA = mB nB = mn.

For matrices with the same dimensions m × n, equality means:

aij = bij for 1 ≤ i ≤ m, 1 ≤ j ≤ n.

The equality of matrices is expressed in terms of the scalar entries of the matrices.

With the column representation, equality means

aj = bj , 1 ≤ j ≤ n.

With the row representation, equality requires

fiT = giT , 1 ≤ i ≤ m.

Finally, with the block representation, if both matrices A and B have the same number of
blocks and (consistent) block decomposition, if we take in (6.5.7) and (6.5.8) m = k and
n = `, equality of the matrices requires:

Aij = Bij , 1 ≤ i ≤ k, 1 ≤ j ≤ `.

6.5.2 Addition
We consider the addition of two matrices A and B. Addition of matrices is defined entry-
wise as we see in this Section.

Let C be the sum of the two matrices A and B. We investigate the conditions when it is
possible to add the two matrices and how to compute their sum.

We consider the scalar, column, and row representations of the two matrices A and B
given by (6.5.1)–(6.5.6).

To be possible to add A and B, we need the dimensions of the two matrices A and B to
be the same, i.e., to satisfy the following relations:

mA = mB = m
nA = nB = n,

i.e., A and B have the same number of rows m and the same number of columns n.

Assuming they have the same dimensions, let the matrix C be the addition of A and B:

C = A + B : mC × nC
6.5 Calculus with Matrices 321

with scalar, column, and row representations of C given by:


C = [cij ] : mC × nC (6.5.9)
= [c1 · · · cnC ], (6.5.10)
 
dT1
=  ... . (6.5.11)
 
dTmC

The elements of the matrix C, the sum of the two matrices A and B, are given by the
entrywise sum:
cij = aij + bij , 1 ≤ i ≤ m, 1 ≤ j ≤ n.
From here we confirm that the number of rows and number of columns in A, B, and C
must be:
mC = mA = mB = m
nC = nA = nB = n.
In other words, to add two matrices, the matrices need to be of the same dimensions
m × n, and the resulting matrix is of the same dimension m × n.

The general case of the sum in scalar representation is


     
c11 · · · c1n a11 · · · a1n b11 · · · b1n
 .. .. ..   .. .. ..   .. .. .. 
 . . . = . . . + . . . 
cm1 · · · cmn am1 · · · amn bm1 · · · bmn
 
a11 + b11 · · · a1n + b1n
=
 .. .. .. 
. . . 
am1 + bm1 · · · amn + bmn

Example 6.5.1 I Addition of matrices: Scalar representation


We illustrate with a numerical example adding two 3 × 4 matrices to obtain a
3 × 4 matrix:
   
2 −3 0 1 1 2 −3 1
C =  1 5 −2 4  +  0 −3 4 −1 
−1 1 2 −4 2 0 −1 5
 
3 −1 −3 2
= 1 2 2 3

1 1 11

322 Vectors and Vector Calculus

In terms of the column representation, the columns of the sum matrix C are:

cj = aj + bj , 1 ≤ j ≤ nC = n.

Since nA = nB = n, the matrix C is then:

C = [c1 · · · cnC ],
 
= a1 + b1 · · · an + bn

where of course nC = nA = nB = n.

Example 6.5.2 I Addtion of matrices: column representation


With the numerical example above:
            
2 1 −3 2 0 −3 1 1
C = 1 + 0  5 + −3  −2 + 4  4 + −1 
−1 2 1 0 2 −1 −4 5

which of course leads to the same result as before:


 
3 −1 −3 2
C =  1 2 2 3 .
1 1 11

In terms of the row representation, the rows of C are:

dTi = fiT + giT , 1 ≤ i ≤ mC = m.

Recalling that mA = mB = m, then mC = m and the matrix C is:


 
dT1
C =  ... 
 
dTm
 
f1T + g1T
= ..
.
 
.
T T
fm + gm

Example 6.5.3 I Addtion of matrices: row representation


6.5 Calculus with Matrices 323

Again, to illustrate with the numerical example above:


   
2 −3 0 1 + 1 2 −3 1
 
   
C=  1 5 −2 4 + 0 −3 4 −1  
 
   
−1 1 2 −4 + 2 0 −1 5

and, once more, leads to


 
3 −1 −3 2
C =  1 2 2 3 .
1 1 11


Finally, in terms of block representation, the addition of the two matrices A and B with
the decompositions (6.5.7)–(6.5.8) is the matrix C with block representation consistent
with the block representations of A and B, whose ij block Cij is given by:

Cij = Aij + Bij .

Example 6.5.4 I Addtion of matrices: block representation


A quick example. Consider the matrices
   
12 5
 34 6 
A= 


   
78 9

and
   
10 11 14
 12 13 15 
B=



   
16 17 18

Then C is given by

C=A+B
       
12 10 11 5 14
 3 4 + 12 13 6
+
15 
=

,

       
7 8 + 16 17 9 + 18
324 Vectors and Vector Calculus

which of course is given in block form by:

C=A+B
   
11 13 19
 15 17 21 
=

,

   
23 25 27

Before leaving this Subsection, we remark again that, while adding scalars is always pos-
sible, addition of matrices is not always possible; we can add square or rectangular ma-
trices, but the matrices need to have the same dimensions.

The next Subsection considers the properties of the addition of matrices.

6.5.2.1 Addition: Properties.


The addition of matrices when it is well defined, i.e., the matrices have the same dimen-
sions, has the following properties:

Associative Addition of matrices is associative:

A + B + C = A + (B + C) = (A + B) + C.

Commutativity Addition is commutative:

A + B = B + A.

Zero matrix: Unit with respect to addition There is a matrix, the zero matrix 0, that is
the identity element of the addition of matrices:

A + 0 = 0 + A.

Inverse with respect to addition For every matrix A, there is a matrix B such that the
sum with the original matrix A is the zero matrix. The matrix B is the negative
of A, so that:

A + (−A) = 0.

Example 6.5.5 I Subtraction: Addition inverse


6.5 Calculus with Matrices 325

For example if

12
A= .
34
Then the negative of A is given by:
 
−1 −2
−A = .
−3 −4
Clearly the addition of these two matrices is the 2 × 2 zero matrix. 

The properties of addition of matrices follow trivially from the corresponding properties
of the addition of scalars, since addition of matrices when defined is obtained from the
addition of their entries.

6.5.3 Product
We consider multiplication of two matrices, including the special case of multiplication
of a matrix by a scalar, the product of two vectors that we did not introduce in Section 6.2,
and then the product of general matrices. We will learn several important things. The
first is that, like with addition of matrices, multiplication of matrices only exists in very
special circumstances that we will need to examine carefully. The second is that given
two matrices A and B we may be able to multiply A by B on the right, i.e., to compute
the product AB, but we may not be able to multiply A by B on the left, i.e., to compute
the product BA. So, now order matters (more on this later), and the product of matrices
may not be commutative, in general. Finally, when we can multiply the two matrices A
and B, the dimensions of the resulting matrix C have to be carefully determined from the
dimensions of A and B.

We start with multiplication of a matrix by a scalar and then vector multiplication (row
vector by column vector and column vector by row vector). Only then, we will address
the general case.

6.5.3.1 Matrix-scalar product


The multiplication of a matrix by a scalar is the matrix whose entries are the entries of
the original matrix multiplied by the scalar. There are no restrictions with respect to the
dimensions of the matrix; in other words, multiplication of a matrix by a scalar is always
possible.

It is defined entrywise. Let A be an mA × nA matrix and α a scalar in the field of interest,


for example, α ∈ R or α ∈ C. The multiplication of the matrix by the scalar is:
326 Vectors and Vector Calculus

C = αA
= [αaij ], 1 ≤ i ≤ mA , 1 ≤ j ≤ nA
 
αa11 · · · αa1nA
 αa21 · · · αa2n 
A 
=  .. .

.. ..
 . . . 
αamA 1 · · · αamA nA

In terms of the row and column representations, the rows and columns of A are simply
multiplied by the scalar α. For example, the row representation is:
 
αf1T
αA =  ...  (6.5.12)
 
T
αfmA

and the column representation leads to a similar result:


 
αA = αa1 · · · αanA . (6.5.13)

Finally, for a matrix A in block form (6.5.7), the multiplication by a scalar α simply multi-
plies each block Aij by the scalar to obtain the block αAij .
 
A11 A12 · · · A1`
αA = α ... .. .. ..  (6.5.14)

. . . 
Ak1 Ak2 · · · Ak`
 
αA11 αA12 · · · αA1`
=  ... .. .. .. . (6.5.15)

. . . 
αAk1 αAk2 · · · αAk`
6.5.3.2 Row vector by column vector: Scalar product
We begin by the simplest case of multiplication of a row vector aT by a column vector b.

Remark 6.5.1 (Scalar product). In this Subsection we work with the row vector aT that is
the transpose of the column vector a. The reason for writing the row vector as the transpose
of a column vector is because the operation of multiplying the row vector aT with the column
vector b is also known as the scalar product of the two column vectors a and b. The scalar
product is the product of two vectors and not the product of a vector by a scalar. The scalar
product of two vectors is also known as the inner product, the dot product, or the internal
product of the vectors. We will come back to the scalar product in Chapter 8. 
6.5 Calculus with Matrices 327

The next Example illustrates how to multiply a row vector by a column vector with a 3-
dimensional example.

Example 6.5.6 I Scalar product: A 3D example


We multiply a row vector aT by a column vector b; in particular, we consider
the two following vectors:
 
aT = 1 2 3 row vector
4
b =  5  column vector
6
Let the multiplication be:
c = aT · b.
The row vector aT is 1 × 3 and the column vector b is 3 × 1. The multiplication
of the row vector aT and the column vector b is possible as we see now.

We first multiply pointwise the entries of the row vector aT with the corre-
sponding entries of the column vector b, and then we accumulate the elemen-
twise products so obtained. This is illustrated below:
 
T
  4
a · b = 1 2 3 5
6
=1·4+2·5+3·6
= 4 + 10 + 18
= 32.
The result of the scalar product of a 3D row vector by a 3D column vector is a
scalar. This justifies the name of the product.

We were able to multiply the row vector aT by the column vector b because the
number of columns of aT , which is 3, is the same as the number of rows of the
column vector b, which is 3. The result is the scalar c = 32. 

General case–Scalar product: Row vector by column vector of same dimensions.

We consider the general case of multiplication of a row vector aT by a column vector b of


the same dimension n:
aT = a1 a2 · · · an
 
328 Vectors and Vector Calculus

 
b1
 b2 
b=
 · · · .

bn
The multiplication of aT by b is the scalar c:
c = aT b
 
b1
  b2 
= a1 a2 · · · an 
···

bn
= a1 b 1 + · · · an b n
Xn
= ai b i . (6.5.16)
i=1

6.5.3.3 Column vector by row vector: Outer product


We now consider the multiplication of a column vector b by a row vector aT .

Remark 6.5.2 (Outer product). It is important to note again that in this Subsection we also
work with the row vector aT ; like before, we emphasize that this row vector is the transpose of
the column vector a. The reason to use this notation is because the operation of multiplying
the column vector b by the row vector aT is also known as the outer product of the two column
vectors b and a. We will come back to the outer product at a later Chapter. 

The next Example illustrates how to multiply a column vector by a row vector with a 3-
dimensional example.

Example 6.5.7 I Outer product: Column vector by row vector: a 3D ex-


ample
We multiply now the column vector b by the row vector aT . It turns out it is
always possible to multiply these two vectors, regardless of the dimension of
each vector. This result will also be used below when we study the product of
two matrices. We get:
C = b · aT (6.5.17)
 
4  
= 5 1 2 3 (6.5.18)
6
6.5 Calculus with Matrices 329

 
4×1 4×2 4×3
= 5 × 1 5 × 2 5 × 3
6×1 6×2 6×3
 
4 8 12
=  5 10 15 
6 12 18
The result of the outer product of a 3D column vector by a 3D row vector is a
3 × 3 matrix.

The second equation tells us how we computed it. We multiplied the first ele-
ment, i.e., the first row, of b, which is 4, by the first element, i.e., the first col-
umn, of the vector aT , which is 1, and placed the result as the element c11 = 4
of the resulting matrix C. Note the indices (1, 1) of c11 , the first 1 goes with
the first row of the vector b that we are using in computing it, and the second
index 1 goes with the first column of the entry of vector aT used to compute c11 .

We proceed to compute the element c12 = 8 of C, and this is obtained by mul-


tiplying the first element, i.e., the first row, of b, which is still 4, by the second
element, i.e., the second column, of the vector aT , which is now 2. Notice again
the relation between the positions of the elements of b and aT used in comput-
ing c12 .

The element cij is the multiplication of ‘row’ bi of b by ‘column’ aj of aT . 

General case: Multiplication of column vector by row vector (outer product).

The general case of multiplication of a column vector b by a row vector aT is:


 
b1
baT =  ... [a1 · · · ana ]
 
bm b
 
b 1 a1 · · · b 1 an a
=  ... .. .. 

. . 
bmb a1 · · · bmb ana

This is a mb × na matrix C.

As just seen, multiplying the column vector b by the row vector aT (column vector times
row vector) is ALWAYS possible because the number of columns of b is 1 and equals the
number of rows of aT , which is also 1. The result is a matrix with dimensions mb × na .
This contrasts with the multiplication of a row vector aT by a column vector b (row vector
330 Vectors and Vector Calculus

times column vector) that is possible only when the number of columns of aT is equal to
the number of rows of b, leading to a scalar.

In neither case is the result a vector! In the first example it is a scalar. In the second
example, it is a matrix – no vector to be seen.

6.5.3.4 Matrix-matrix product


We address the multiplication of two generic matrices. Before attempting to multiply two
matrices A and B, we need first to determine if they can be multiplied. If they can be
multiplied, we say that the matrix C that is their product is well defined.

We will consider the matrix-matrix product in scalar, row, column, and block representa-
tions.

Matrix product: scalar representation Like in the previous Subsections, we start with
an example.

Example 6.5.8 I Matrix – matrix multiplication


We multiply the two matrices given as follows:
 
 7 10 13

123 
C=A·B= 8 11 14  (6.5.19)
456
9 12 15
The rule is to multiply each of the rows of A by each of the columns of B. Each of
these is the product of a row vector by a column vector, which we already learned
how to do. For example, the element c11 of C is the multiplication of the first row
[1 2 3] of A with the first column [7 8 9]T of B. Carrying this out for all the entries
of C leads to:
 
1 · 7 + 2 · 8 + 3 · 9 1 · 10 + 2 · 11 + 3 · 12 1 · 13 + 2 · 14 + 3 · 15
C=
4 · 7 + 5 · 8 + 6 · 9 4 · 10 + 5 · 11 + 6 · 12 4 · 13 + 5 · 14 + 6 · 15
 
50 68 86
= . (6.5.20)
122 167 212
The product of the two matrices involved a number of products of row vectors by
column vectors; these are possible if the row vectors and the column vectors have
the same dimensions, which are the number of columns of the first matrix factor A
and the number of rows of the second matrix factor B, respectively. So, before at-
tempting to perform the multiplication of two matrices, we need to check if these
6.5 Calculus with Matrices 331

dimensions are the same. Figure 6.5 illustrates these facts, as well as the dimensions
of the resulting matrix C, with the two matrices displayed.

Figure 6.5 Matrix – matrix multiplication

For completeness and as a second example, we show the detailed computations of


the multiplication of the matrices in Figure 6.5:
 
1 · 1 + 1 · (−1) + 0 · 2 1 · 0 + 1 · 1 + 0 · (−1) 1·1+1·0+0·3
C=
(−1) · 1 + 2 · (−1) + 1 · 2 (−1) · 0 + 2 · 1 + 1 · (−1) (−1) · 1 + 2 · 0 + 1 · 3
 
011
=
112


We collect from these two numerical examples and for the record the conditions under
which we can multiply two matrices:
A · |{z}
C = |{z} B
mA ×nA mB ×nB

We check that
(no. of columns of A) nA = mB (no. of rows of B)
If this is true then we can multiply the two matrices and the dimensions of the resulting
matrix C are:
C : mA × nB
i.e., the number of rows mC of C is the number of rows mA of A, and the number of
columns nC of C is the number of columns nB of B.
332 Vectors and Vector Calculus

Remark 6.5.3 (Matrix–matrix multiplication not always allowed). If we exchange the


order of the matrices in Figure 6.5, we can not multiply them because the dimensions are not
compatible since the number of columns of the right matrix is 3, which is different from the
number of rows of the left factor in the Figure, which is 2. 

We now state the general rule to multiply two matrices in scalar representation.

General rule: Multiplication of a matrix by a matrix in scalar representations.

We consider the two matrices A : mA × nA and B : mB × nB in scalar representation. If


nA = mB = n, their product C : mC × nC in scalar representation is given by:

C=A·B (6.5.21)
 Pn Pn 
`=1a1` b`1 ··· `=1 a1` b`nB
.. .. ..
=  : mA × nB (6.5.22)
 
Pn . . Pn .
`=1 amA ` b`1 · · · `=1 amA ` b`nB

Matrix product: row-column representation We now consider the multiplication of


matrices using row-column representations as we explain next.

Matrix (in row representation) times matrix (in column representation).

We first consider A in row representation and B in column representation.

Example 6.5.9 I Matrix product: rows by columns


We use the numerical example in (6.5.19). We have:
 T
a1  
A= , B = b 1 b 2 b 3 . (6.5.23)
aT2

The rows of A are:

aT1 = [1 2 3]
aT2 = [4 5 6]

The columns of matrix B are


     
7 10 13
b1 = 8 , b2 = 11 , and b3 = 14 .
    
9 12 15
6.5 Calculus with Matrices 333

The multiplication of A in row representation by B in column representation


is:
 T
a1  
AB = T b1 b2 b3 . (6.5.24)
a2
Each entry cij of the product C is the multiplication of the row vector aTi by the
column vector bj . From Example 6.5.6, we know how to compute the product
of a row vector by a column vector. We get:
cij = aTi · bj .
The entry cij is well defined, since the dimension of the row vector aTi is the
same as the dimension of the column vector bj :
na = mb = 3. (6.5.25)
Also, since aTi · bj is the multiplication of a row vector by a column vector, the
result of this multiplication is a scalar.

The dimensions of the resulting matrix C are:


mA × nB .
The result of the product using the row-column representation of A and B is:
 T 
a1 · b1 aT1 · b2 aT1 · b3
C= .
aT2 · b1 aT2 · b2 aT2 · b3


General case: Matrix (in row representation) times matrix (in column representation).

We indicate the general case. Consider the multiplication of the two matrices:
C = AB (6.5.26)
 
aT1
 aT 
 2  
=  ..  b1 b2 · · · bnB . (6.5.27)
 . 
aTmA
Performing the calculations, we get:
 
aT1 b1 aT1 b2 · · · aT1 bnB
 aT b1 aT b2 · · · aT1 bnB 
 2 2
C =  .. . (6.5.28)

.. .. ..
 . . . . 
aTmA b1 aTmA b2 T
· · · amA bnB
334 Vectors and Vector Calculus

Matrix product: column-row representation We now consider the multiplication of


matrices using column-row representations as we explain next.

Matrix (in column representation) times matrix (in row representation).

We now consider the matrix A in column representation and the matrix B in row repre-
sentation.

Example 6.5.10 I Matrix product: column by row


We continue with the numerical example in (6.5.19):
 
A = f1 f2 f3 (6.5.29)
 T
g1
B =  g2T . (6.5.30)
g3T

The columns of matrix A are


     
1 2 3
f1 = , f2 = , and f3 = .
4 5 6

The rows of B are:

g1T = [7 10 13]
g2T = [8 11 14]
g3T = [9 12 15]

We now write the multiplication of the matrices A and B using the column
representation of A and the row representation of B:
 T
  g1T
AB = f1 f2 f3  g2  (6.5.31)
g3T
= f1 g1T + f2 g2T + f3 g3T (6.5.32)
     
1 2 3
= [7 10 13] + [8 11 14] + [9 12 15]
4 5 6
     
7 10 13 16 22 28 27 36 45
= + + (6.5.33)
28 40 52 40 55 70 54 72 90
 
7 + 16 + 27 10 + 22 + 36 13 + 28 + 45
=
28 + 40 + 54 40 + 55 + 72 52 + 70 + 90
 
50 68 86
= (6.5.34)
122 167 212
6.5 Calculus with Matrices 335

Of course the result in (6.5.34) is the same as the result we obtained in (6.5.20).

Each of the terms in (6.5.33) is a matrix and the product of the matrices using
the column representation of A and the row representation of B leads to the
multiplication being given as the sum of three matrices. The number three is
exactly the number of columns of A that equals the number of rows of B, i.e.,
we can multiply the matrices because:
nA = mB = 3. (6.5.35)
This condition (6.5.35) is the same as we obtained before in (6.5.25), as we
should expect. 

General rule: Matrix (in column representation) times matrix (in row representation).

The general rule to multiply two matrices, the first in column representation and the
second in row representation is:
C = AB
 
bT1
  bT2 
= a1 a2 · · · anA 
 
.. 
 . 
bTmB
= a1 bT1 + a2 bT2 + · · · + anA bTmB (6.5.36)
with nA = mB . Each of the terms in the sum is a matrix with dimensions mA × nB .

This result (6.5.36) follows easily from (6.5.21) as we show now. Repeat Equation (6.5.21)
to obtain successively:
C=A·B (6.5.37)
 
  b11 · · · b1nB
a11 · · · a1` · · · a1nA  ··· ··· ··· 
 .. .. .. .. ..  ·  
= . . . . .    b`1 · · · b`nB   (6.5.38)
amA 1 · · · amA ` · · · amA nA  ··· ··· ··· 
bmB 1 · · · bmB nB
 Pn Pn 
`=1 a1` b`1 ··· a1` b`nB
`=1
=
 .. .. .. 
(6.5.39)
Pn . Pn. . 
`=1 amA ` b`1 · · · `=1 amA ` b`nB
 
n
X .a 1` b`1 · · · a 1` b`nB
.. ..
=  .. (6.5.40)

. . 
`=1 amA ` b`1 · · · amA ` b`nB
336 Vectors and Vector Calculus

 
n a1`
X  ..  
=  .  b`1 · · · b`nB (6.5.41)
`=1 a
mA `

where n = nA = mB . Equation (6.5.41) is Equation (6.5.36) proving the result we wanted


to prove.

Matrix product: block representation We now consider the product of two matrices A
and B in block form. This is trickier than the previous three cases and care should be
taken to make sure that the block decompositions of A and B allow the product of the
indicated subblocks. This is best illustrated working with specific cases.

Example 6.5.11 I Matrix product: block form


Consider the product:
   
A11 A12 B11 B12
A·B= · .
A21 A22 B21 B22
This product, if defined, is given by:
 
A11 · B11 + A12 · B21 A11 · B12 + A12 · B22
A·B= .
A21 · B11 + A22 · B21 A21 · B12 + A22 · B22
We have a number of products of blocks and then the resulting matrices are
added up. Careful analysis shows that the following relations between the
dimensions of the subblocks of each matrix need to hold for the block product
to be well defined:
nA11 = mB11 ; nA12 = mB21 ; nA11 = mB12 ; nA12 = mB22
nA21 = mB11 ; nA22 = mB21 ; nA21 = mB12 ; nA22 = mB22

Note that, by consistency of the block decompositions, we already know that


the dimensions of the blocks of A satisfy:
mA11 = mA12 ; mA21 = mA22
nA11 = nA21 ; nA12 = nA22 ;
and, likewise, the dimensions of the blocks of B satisfy:
mB11 = mB12 ; mB21 = mB22
nB11 = nB21 ; nB12 = nB22 ;

6.5 Calculus with Matrices 337

Example 6.5.12 I Product of diagonal block square matrices


We consider an example where the product of two block square matrices is
particularly straightforward. Consider that the two square matrices A and B
are both in block diagonal form, i.e., only the diagonal blocks Aii and Bii are
nonzero. We assume that these blocks are square. Using the diagonal notation
introduced before, where now the diagonal entries are block square matrices
we can write succinctly:

A = diag[A11 · · · Ann ]
B = diag[B11 · · · Bnn ]

It is straightforward to verify that their product is:

C=A·B
diag[A11 · B11 · · · Ann · Bnn ].

The dimensions of the square blocks Aii are not constrained by the dimen-
sions of other diagonal square blocks of A, and, likewise, the dimensions of
the square blocks Bii are not constrained by the dimensions of other diago-
nal square blocks of B. However, the dimensions of corresponding blocks Aii
and Bii are the same. The dimensions of the square blocks Cii are the dimen-
sions of the blocks Aii and Bii .

The product of block diagonal matrices is a block diagonal matrix. Of course if


the blocks are scalars, i.e., the matrices are diagonal, this shows that the prod-
uct of two diagonal matrices is diagonal. 

6.5.3.5 Matrix-matrix products where one is either in row or column form


We consider the product of a matrix by another matrix where one of the factors is ei-
ther in row form (the first factor) or in column form (the second factor). This includes
the product of a matrix by a vector. For these products to be possible, we know that the
dimensions of the factors need to be appropriately matched. We consider these cases now.

The question here is not how to multiply the two matrices. This we have learned in the
previous Subsections. The issue is to perform this product in such a way to exhibit a new
structure of the product matrix. This may be useful in applications.

We first consider the product of a matrix by a vector. Let A be a mA × nA matrix, and b


be a column vector, mb × 1, and c be a row vector, 1 × nc . If mA > 1 and nA > 1, i.e.,
the matrix A is not a scalar, row vector, or column vector, then only the two following
338 Vectors and Vector Calculus

products can be defined. Product of a column vector by a matrix on the left


A · b,
or product of a row vector by a matrix on the right
c · A.
In other words, as long as the dimensions match, a matrix A can multiply a column
vector b from or on the left, nA = mb , or can multiply a row vector c from or on the right,
nc = m A .

Remark 6.5.4 (Matrix-vector products). Because we usually assume the vectors to be col-
umn vectors, when we refer to matrix vector products, or a matrix multiplying a vector,
we assume implicitly that the matrix multiplies the vector on the left. Unless the context
states otherwise, or we mention it explicitly, matrix-vector product will assume the matrix
multiplies the vector on the left. 

We give next three examples that are important in applications that illustrate the use of
these products.
Example 6.5.13 I Linear combination revisited: Matrix-vector product
The first example of the product of a matrix A in column representation by a
matrix B in row representation is when B is a column vector b.

Let matrix A be given in vector form and the (column) vector b, where we
assume that the number of columns of A, nA , and the number of entries of b,
mb , are equal, nA = mb :
C = Ab
 
b1
  b2 
= f1 f2 · · · fnA  (6.5.42)
 
.. 
 . 
bmB
= b1 f1 + b2 f2 + · · · + bmB fnA . (6.5.43)
If we recall the linear combination of vectors given by (6.3.1) in Section 6.3.5, we
recognize that (6.5.43) is the linear combination of the columns f1 , f2 , · · · , fnA ,
of A. We can state this in a different way, by interpreting a linear combination
of vectors as the product of a matrix, whose columns are the vectors, by a (col-
umn) vector whose entries are the coefficients of the linear combination.

Once again, we encounter the case where the same object is interpreted in dif-
ferent ways. We should get used to this–looking at similar objects from differ-
ent perspectives. 
6.5 Calculus with Matrices 339

We now consider the product of the matrices A and B, where B is in column vector form.

Example 6.5.14 I Product of a matrix by a matrix in column format


Consider the product of two matrices A and B where B is in column format.
Assume that the number of columns of A equals the number of rows of B,
nA = mB , so their product is well defined. Then, we can show:
C = AB
 
= A b1 · · · bnB (6.5.44)
 
= Ab1 · · · AbnB . (6.5.45)
This shows that the matrix A multiplies the individual columns of the ma-
trix B. This is very useful and will be used later. 

We now consider the same product of the matrices A and B, but now A is in row vector
form.

Example 6.5.15 I Product of a matrix in row form by a matrix


Consider the product of two matrices A and B where A is in row format.
Assume that the number of columns of A equals the number of rows of B,
nA = mB , so their product is well defined. Then, we can show:
C = AB
 
aT1
=  ... B (6.5.46)
 
amA
 
aT1 B
=  ... . (6.5.47)
 
aTmA B
This shows that B multiplies (from the right) the individual rows of the A. 

6.5.3.6 Powers: square matrices


The power of square matrices is defined as usual: if we multiply a square matrix A by
itself, we get the power 2 of the matrix or the square of the matrix, A2 . To be precise, if A
is mA × nA , to multiply A by itself, we need mA = nA . This emphasizes that powers of
matrices can only be defined for square matrices. So, if A : nA × nA :
A2 = A · A.
340 Vectors and Vector Calculus

More generally, the nth-power of a square matrix is defined recursively:


An = A · An−1
= An−1 · A.

Remark 6.5.5 (Computational effort: Product of matrices). To compute A2 , we carry


out the scalar product of each of the nA rows of A with each of its nA columns.These are nA
scalar products. Each of these scalar products requires nA products and nA − 1 additions.
Then, A2 takes n3A multiplications and n2A (nA − 1) additions. We usually say simply that
to compute A2 takes, or is of the order of, n3A floating point operations (flop). Current off the
shelf computing technology, laptop or smartphone, is capable of several Gflops, i.e., 109 flops,
per second. Then, a back of the envelope calculation indicates that computing the square of a
103 × 103 takes roughly on the order of a second in these platforms.

Computing powers of square matrices from the definition is then, in general, computationally
heavy. We will see in due time speedier ways to compute successive powers of a matrix. 

We now consider the properties of products of matrices.

6.5.3.7 Product of Matrices: Properties


The product of two matrices when it is well defined is associative but in general not com-
mutative. For example, a matrix A that is 2 × 3 can be multiplied by a matrix B that is
3 × 3, but the multiplication BA is not well defined, i.e., can not be performed.

If both matrices are square and have the same dimensions, the product AB and the prod-
uct BA are both well defined, but in general the resulting matrices are different.

Example 6.5.16 I Matrix product: Non commutative


For example
    
11 12 0 3
=
−1 1 −1 1 −2 −1
On the other hand, if we interchange the order of the two factors on the Left-
Hand-Side, we get a different result:
    
12 11 −1 3
= .
−1 1 −1 1 −2 0
However, there are cases when interchanging the order does lead to the same
result. For example:
    
1 1 1 −2 3 −1
= .
−1 1 2 1 1 3
6.5 Calculus with Matrices 341

Interchanging the order:


    
1 −2 11 3 −1
= ,
2 1 −1 1 1 3

leading to the same result. 

These examples illustrate that the properties of matrix multiplication may not enjoy the
same properties of multiplication of numbers (scalars). We collect here the main proper-
ties of matrix multiplication.

Associativity It is associative:

A · B · C = (A · B) · C = A · (B · C)

Non commutativity It is not commutative (in general):

1. Either because AB or BA or both are not defined; or


2. when both are defined, it may be that AB 6= BA.

Multiplication by zero matrix 0 . A · 0 = 0, but the product of two matrices can be zero
without either factor being a zero matrix, as for example in the following:
    
11 1 −1 00
= .
1 1 −1 1 00

Distributivity of product by scalar and addition of matrices . We have, for α a scalar:

α(A + B) = αA + αB.

Distributivity of sum of scalars and product by matrix We have, for α and β scalars:

(α + β)A = αA + βA.

Distributivity of product of a matrix by addition of matrices . The product of a matrix


by a sum of matrices is distributive:

A · (B + C) = A · B + A · C
(A + B) · C = A · C + B · C

as long as the dimensions are appropriate so that all indicated products are valid.
342 Vectors and Vector Calculus

Identity If A is a square matrix with dimensions n × n and In the nD-identity matrix,


then one can check that

A · In = A
In · A = A

Identity The identity matrix I is the identity element of the multiplication of square ma-
trices.

Inverse For square matrices A : n × n, there may be a matrix B : n × n such that

A·B=I (6.5.48)
B·A=I (6.5.49)

If such B exists (and it might not), then B is the inverse of A. It is represented by A−1 .

Result 6.5.1 (Matrix inverse is unique). The inverse A−1 of a matrix A when it
exists is unique.

Proof I The proof is simple. Let B and C be two inverses of the matrix A and I the
identity matrix. Then

B = IB
= (CA)B
= C(AB)
= C.


In Section 6.6.3, we consider when the inverse of a square matrix exists and, if it
does exist, how to compute it.

The next result shows that the inverse of the inverse is the original matrix.

Result 6.5.2 (Inverse of inverse). The inverse of the inverse A−1 of a matrix A is the
original matrix A.

Proof I This follows immediately from the definition:

A−1 A = I
6.5 Calculus with Matrices 343

AA−1 = I.

It immediately follows that A is the inverse of A−1 , i.e.:


−1
A−1 = A.

Except for the existence of the inverse of a square matrix, all the other properties listed
above are straightforward to verify and follow from the corresponding properties for
products of scalars.

6.5.4 Matrix Conjugation, Transposition, and Hermitian


We study three additional important operations with matrices.

6.5.4.1 Conjugation
Given A, possibly with complex valued entries, its conjugate, represented by A∗ , is the
matrix whose entries are the complex conjugates of the entries of A, i.e.:

A∗ = a∗ij
 

where A = [aij ]. The dimensions of A∗ and A are the same. If the entries of a matrix are
real valued, then conjugation does not affect the matrix, leaving it invariant

A = A∗

whenever A is real valued (this is short hand to say that the entries of A are real valued).

6.5.4.2 Transposition and symmetric matrices


Given A, obtain matrix B—transpose of A, by interchanging the rows and columns of A:
the entry bij of B in row i and column j equals entry aji of A in row j and column i

B = [bij ] = [aji ].

The transpose B of a matrix A is represented by B = AT where the superscript T stands


for transposition.

If the matrix A : mA × nA and the matrix B : mB × nB then:

mB = nA and nB = mA .
344 Vectors and Vector Calculus

As we saw in Section 6.2, the transpose of a row vector becomes a column vector:
 
a1
 a2 
aT = [a1 a2 · · · ana ]T = 
 · · · .

ana
Likewise, the transpose of a column vector becomes a row vector:
 T
a1
 a2 
aT = 
 · · ·  = [a1 a2 · · · ama ].

ama
Transposition of a matrix in row representation. We transpose A : mA × nA in row format:
 
f1T
A =  ... .
 
T
fm A

Then
AT = f1 · · · fmA .
 

This is a nA × mA matrix given in column format.

Transposition of a matrix in column representation. We consider the transpose of a matrix


A : mA × nA given in column format. The result is similar to the previous one. We have:
 
A = g1 · · · gnA .
Then  
g1T
AT =  ... .
 
gnA
This is a nA × mA matrix given in row format.

Transposition of a matrix in block form representation. We consider the transpose of a matrix


A : mA × nA given in block format. Transposing A given in (6.4.14):
 T
A11 A12 · · · A1`
AT =  ... .. .. ..  (6.5.50)

. . . 
Ak1 Ak2 · · · Ak`
 
AT11 AT21 · · · ATk1
=  ... .. .. .. . (6.5.51)

. . . 
T T T
A1` A2` · · · Ak`
Next we state the important result of transposition of the product of two matrices.
6.5 Calculus with Matrices 345

Result 6.5.3 (Transpose of product of matrices). Given matrices A1 , A2 , · · · , An then:


(Πni=1 Ai )T = (An · An−1 · · · A1 )T
= AT1 · AT2 · · · ATn
= Πni=1 ATn−i+1 .

We assume that the matrices have compatible dimensions, so all products are well defined.

Note that the first line of the equation defines the Π notation.

Proof I The proof is by induction.

Step 1: We prove first for the transpose of the product of two matrices.

For ease of notation, we consider the two matrices A and B, which are mA × nA and
mB × nB . We assume that nA = mB . Write the row and column representations of A
and B:
 T 
a1
A = ··· 

aTmA
B = [b1 · · · bnB ].

Then, the generic element cij of

C = AB

is:

cij = aTi bj .

The generic element dji of

D = CT

is

dji = cij
= aTi bj
= bTj ai .

The last equation follows from the second because the transpose of a scalar is the same
scalar. But, the last equation is simply:

D = CT
346 Vectors and Vector Calculus

 
bT1
=  · · · [a1 · · · amA ]
bTnB
= BT AT ,

as desired. This proves that the transpose of the product of two matrices is the product of
the transposed matrices by reverse order.

Step 2: The induction step assumes that the Result is true for the product of n−1 matrices:

(An−1 · · · A1 )T = AT1 · AT2 · · · ATn−1 .

Step 3: We now prove for the product of n matrices. From associativity of the product of
matrices and from the transpose of the product of two matrices:

(An An−1 · · · A1 )T = (An [An−1 · · · A1 ])T


= (An−1 · · · A1 )T · ATn
= AT1 · AT2 · · · ATn−1 · ATn ,

as we needed to prove. 

Symmetric matrices. A matrix A is said to be symmetric if it equals its transpose:

A = AT .

A symmetric matrix is square, since if A : mA × nA , then AT : nA × mA , and equality


implies mA = nA .

Trivially, a scalar is symmetric

aT = a.

Skew symmetric matrices. A matrix A is said to be skew symmetric if it equals the negative
of its transpose:
A = −AT .
A skew symmetric matrix is square; this is proven by an argument similar to showing
that a symmetric matrix is square. Ir follows also that the diagonal elements of a skew
symmetric matrix are zero.

6.5.4.3 Hermitian and Hermitian matrices


The Hermitian matrix of a given matrix A, represented by AH , is the transpose conjugate
of the original matrix: ∗
AH = (A∗ )T = AT
6.5 Calculus with Matrices 347

Clearly, if A is real valued


AH = AT
Hermitian matrices. A matrix A is said to be Hermitian if it equals its Hermitian, i.e., its
transpose and conjugate:
AH = A
For an Hermitian matrix the diagonal entries are real valued since aii = x + jy = x − jy,
hence y = −y = 0.

Hermitian matrices are square.

Trivially, for a scalar


aH = a∗ .
Skew Hermitian matrices. A matrix is skew Hermitian if
AH = −A;
i.e., aij = −a∗ji . Note that for a skew Hermitian matrix the diagonal entries aii have to be
purely imaginary or zero, since aii = x + jy = −(x − jy), implying x = −x = 0.

6.5.5 Matrices with special structure


We have seen already several examples of matrices that exhibit particular structure; for
example, the zero matrix, the identity matrix, diagonal matrices, symmetric matrices, and
Hermitian matrices. We now introduce a number of additional matrices with special struc-
ture. Their structure is special not only because of their structure but also because they
play special roles as we will have occasion to verify. These are just a sample, and there
are many other matrices that have significant structure or significant properties or arise
or play significant roles in important contexts.

Orthogonal matrix A square matrix A is an orthogonal matrix if its inverse is its transpose,
i.e.:
A T · A = A · AT = I
where I is the identity matrix.

Example 6.5.17 I Orthogonal


Two examples of orthogonal matrices are the base case of the Fourier ma-
trix F2 and the rotation matrix R(θ), as we can verify immediately.
348 Vectors and Vector Calculus

For the Fourier base case, we have:


   T
T 1 1 1 1 1 1
F2 · F2 = √ √
2 1 −1 2 1 −1
= I2 .

Note that F2 is a symmetric matrix. The factor √1 is now justified because


2
it makes the matrix F2 orthogonal.

Recall from Example 6.4.7 the rotation matrix R(θ). For this matrix, we
get:
   T
T cos θ sin θ cos θ sin θ
R(θ) · R(θ) = ·
− sin θ cos θ − sin θ cos θ
   
cos θ sin θ cos θ − sin θ
= ·
− sin θ cos θ sin θ cos θ
= I2 ,

as per direct verification. We used the standard trigonometric formula


that sin2 θ + cos2 θ = 1. 

Unitary matrix A square matrix A is a unitary matrix if its inverse is its Hermitian, i.e.:

A H · A = A · AH = I

where I is the identity matrix.

Example 6.5.18 I Unitary


The general Fourier matrix provides an example of a unitary matrix, as
we can verify directly. We carry out explicitly for the 4-dimensional dis-
crete Fourier matrix F4 :
   H
1 1 1 1 1 1 1 1
1 1
 −j −1 j  · 1  1 −j −1 j 
 
F4 · FH
4 = 
2 1 −1 1 −1  2 1 −1 1 −1 

1 j −1 −j 1 j −1 −j
   
1 1 1 1 1 1 1 1
1 1
 −j −1 j  ·  1 j −1 −j 
 
= 
2 1 −1 1 −1   1 −1 1 −1 
1 j −1 −j 1 −j −1 j
6.5 Calculus with Matrices 349

= I4 .

It is worth noting that the Fourier matrix is unitary as shown and sym-
metric, but NOT Hermitian. 

6.5.6 Limits, derivatives, integration, delay, and Taylor series with ma-
trices
Just like we did with vectors, we can consider sophisticated operations with matrices like
limits, differentiation, integration, delay, and Taylor series. These are all defined entry-
wise. We consider these very briefly here, since it is a straightforward extension of the
concepts for vectors in Section 6.3.6.

6.5.6.1 Limit
Consider the m × n matrix (of functions) A(t), t ∈ T ⊂ R. Let t0 ∈ T . The limit of the
matrix of functions is defined entrywise and given by
 
A11 (t) · · · A1n (t)
lim A(t) = lim  ... .. ..
 
t→t0 t→t0
. . 
Am1 (t) · · · Amn (t)
 
limt→t0 A11 (t) · · · limt→t0 A1n (t)
= .. .. ..
.
 
 . . .
limt→t0 Am1 (t) · · · limt→t0 Amn (t)

The limit of the matrix of functions is the matrix of the limits of the functions in each entry
of the matrix. To prove this result, we need to introduce concepts that we do not have yet
available like distance between matrices that formalize the notion of matrices being close
to each other.

Example 6.5.19 I Matrix limits


We provide an example. Let

1 [e−2t + e−t ] [−2e−2t + 3e−t ]


 
Φ(t) = . (6.5.52)
2 [4e−2t + 3e−t ] [e−2t + 5e−t ]

Then,

1 limt→0 [e−2t + e−t ] limt→0 [−2e−2t + 3e−t ]


 
lim Φ(t) = lim −2t
t→0 t→0 2 limt→0 [4e + 3e−t ] limt→0 [e−2t + 5e−t ]
350 Vectors and Vector Calculus

 
1 21
= .
2 76

6.5.6.2 Derivative of a matrix of functions


Let A(t) be a matrix whose entries are differentiable functions. The derivative of the
matrix A(t) is defined entrywise, i.e., it is the matrix of the derivatives of the entries of
the matrix:
 
A11 (t) · · · A1n (t)
dA(t) d
=  ... .. .. 
dt dt . . 
Am1 (t) · · · Amn (t)
 dA11 (t)
· · · dA1n (t)

dt dt
=  ... .. .. .

. . 
dAm1 (t) dAmn (t)
dt
··· dt

Example 6.5.20 I Matrix derivatives


We work out an example. Consider Φ(t) in (6.5.52). Then

d 1 [e−2t + e−t ] [−2e−2t + 3e−t ]


  
dΦ(t)
=
dt dt 2 [4e−2t + 3e−t ] [e−2t + 5e−t ]
 
d[e−2t +e−t ] d[−2e−2t +3e−t ]
1 dt dt
= 
2 d[4e +3e ] d[e +5e ]
−2t −t −2t −t

dt dt
−2t −t
4e − 3e−t
−2t
 
1 −2e − e
= .
2 −8e−2t − 3e−t −2e−2t − 5e−t

6.5.6.3 Integral of a matrix of functions


Let A(t) be a matrix whose entries are integrable functions. The integral of the matrix
A(t) is defined entrywise and it is given by the matrix of integrals of the entries of the
matrix:
 
Z tf Z tf A11 (t) · · · A1n (t)
 .. .. ..
A(t)dt =  dt

 . . .
ti ti
Am1 (t) · · · Amn (t)
6.6 Functions of Matrices 351

 R tf R tf 
ti
A11 (t) dt ··· ti
A1n (t) dt
= .. .. ..
.
 
. . .
R tf R tf
ti
Am1 (t) dt · · · ti
Amn (t) dt

Example 6.5.21 I Matrix integrals


We consider the integral of the matrix Φ(t) in (6.5.52):
Z t  R t −2t −t
Rt −2t −t

1 [e + e ]dt [−2e + 3e ]dt
Φ(t) dt = R t0 0R
t
0 2 0 [4e−2t + 3e−t ]dt 0 [e−2t + 5e−t ]dt
1 − 12 e−2t − e−t + 32 e−2t + 3e−t − 4
 
= .
2 −2−2t − 3e−t + 5 − 12 e−2t − 5e−t + 11 2

6.5.6.4 Advance and delay

The advance or delay of a matrix of sequences is also an operation that is applied entry-
wise. For example, let:
 
A11 [k + 1] · · · A1n [k + 1]
A[k + 1] =  .. .. ..
.
 
. . .
Am1 [k + 1] · · · Amn [k + 1]

The delayed A[k − 1] is computed similarly.

6.5.6.5 Taylor series

We can define the Taylor series of a matrix of functions, again, entrywise. Since it is a
straightforward extension of the Taylor series of vectors, we refer to Section 6.3.6.5 for
details.

6.6 Functions of Matrices


We consider the determinant, the trace, and the inverse of a matrix. All these three func-
tions are defined for square matrices only. The inverse of a matrix was defined in Sec-
tion 6.5; here we learn how to actually compute it, when a matrix is invertible, and intro-
duce several properties of the inverse of a matrix.
352 Vectors and Vector Calculus

6.6.1 Determinant of a Square Matrix


The determinant of a matrix is rarely computed but it is an important scalar; still, it plays
a very important role. It is defined for square matrices only.

It is defined recursively. Before attempting to do it, we consider some preliminaries that


will be useful. Consider the 3 × 3 matrix A:
 
123
A =  4 5 6 .
789

From this matrix we can obtain other matrices by discarding rows or columns of both.
These so obtained matrices are called submatrices of the original matrix. For example, if
we eliminate row 2 and column 1, we get the 2 × 2 submatrix:
 
23
A= .
89

We introduce notation. The determinant of a square matrix A is indicated variously by:

|A| or det A.

Note that the notation | · | for the determinant of a matrix is the same notation that we
used to indicate the magnitude or absolute value of a scalar. Even though the notation
is the same, the two concepts are very different and should not be confused. The context
should disambiguate which one is meant.

We now define the determinant of a square matrix recursively.

Definition 6.6.1 (Determinant of a square matrix: Cramer’s rule). If A is a scalar a, its


determinant is the scalar itself
|A| = a,

For n ≥ 2, the determinant of a square n × n matrix A = [aij ] is defined by:


n
X
|A| = ai1 Ai1 + · · · + ain Ain = aij Aij (6.6.1)
j=1

Aij = (−1)i+j Mij (6.6.2)


where
where Mij is the determinant of the (n − 1) × (n − 1) submatrix of A obtained by eliminating
row i and column j, i.e., the row and column associated with the element aij of the matrix A.
6.6 Functions of Matrices 353

Definition 6.6.1 is recursive, because the determinant of order n, i.e., of a square matrix
of dimension n, is expressed in terms of determinants of order n − 1. We also see that
eventually the determinant of order 2 is expressed in terms of determinants of order 1,
i.e., in terms of scalars.

Minor and cofactor. The determinant Mij in (6.6.2) is called the minor associated with the
element aij in the matrix A. The quantity Aij given in (6.6.2) is the cofactor of the element
aij in the matrix A.

In Equation (6.6.1), we expanded the determinant in terms of row i. We could have used
any other row. We could have also alternatively defined the determinant by an expansion
in terms of a column of the matrix. We can have then any of the following 2n possible
expressions for the determinant:
n
X
|A| = aij Aij , 1 ≤ i ≤ n (6.6.3)
j=1
n
X
= (−1)i+j aij Mij , 1 ≤ i ≤ n (6.6.4)
j=1

or
n
X
|A| = aij Aij , 1 ≤ j ≤ n (6.6.5)
i=1
Xn
= (−1)i+j aij Mij , 1 ≤ j ≤ n. (6.6.6)
i=1

The important point is that all these 2n expressions are equivalent and lead to the same
value for the determinant. This provides opportunity to choose the expansion that is the
simplest to compute. We will not prove this.

Example 6.6.1 I Determinant of a 2 × 2 matrix


We compute the determinant of the 2 × 2 matrix:
 
a11 a12
A= .
a21 a22

We first compute the minors and cofactors associated with the first row:

M11 = a22
A11 = (−1)1+1 × a22 = a22
M12 = a21
354 Vectors and Vector Calculus

A12 = (−1)1+2 × a21 = −a21

Then the determinant is:

|A| = a11 a22 − a12 a21 .

This is highly mnemonic. The determinant of a 2 × 2 matrix is the product of


the diagonal elements minus the product of the counter diagonal elements.

It is left as an exercise that we get the same expression if we expand the de-
terminant by the second row, or by the first column, or by the second column.


Example 6.6.2 I Determinant of a 3 × 3 matrix


We compute the determinant of the 3 × 3 matrix:
 
a11 a12 a13
A =  a21 a22 a23 .
a31 a32 a3

We first compute the minors and cofactors associated with the first row:

a22 a23
M11 =
a32 a33
= a22 a33 − a23 a32
A11 = (−1)1+1 × M11 = a22 a33 − a23 a32

a21 a23
M12 =
a31 a33
= a21 a33 − a23 a31
A12 = (−1)1+2 × M12 = −(a21 a33 − a23 a31 )

a21 a22
M13 =
a31 a32
= a21 a32 − a22 a31
A13 = (−1)1+3 × M13 = a21 a32 − a22 a31

Then the determinant is:

|A| = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
= a11 a22 a33 + a12 a23 a31 + a21 a32 a13 − a13 a22 a31 − a21 a12 a33 − a11 a23 a32 .
6.6 Functions of Matrices 355

This is also highly mnemonic. The determinant of a 3 × 3 has six terms: the
product of the diagonal elements a11 a22 a33 , plus the product of the elements of
the first upper diagonal (the diagonal immediately above the main diagonal)
times the element on the left lower corner a12 a23 a31 , plus the product of the
elements in the first lower diagonal times the element in the top right corner
a21 a32 a13 , minus the product of the elements on the counter diagonal a13 a22 a31 ,
minus the product of the elements in the first upper counter diagonal times the
element in the lower right corner a21 a12 a33 , minus the product of the elements
in the first lower counter diagonal times the first element a11 a23 a32 .

It is left as an exercise to show that one gets the same expression if we expand
the determinant using the second or third row, or the first, second, or third
column. 

We introduce the concepts of singular and non singular matrix.

Definition 6.6.2 (Singular and non singular matrices). A square matrix A is singular if
its determinant is zero, |A| = 0. A matrix A is non-singular if the determinant is non-zero,
|A| =6 0.

We now study some important properties of determinants. We start with the determinant
of a few structured matrices. These are left as exercises.

Matrix with a row multiplied by a scalar The determinant of the matrix C that is the
matrix A but with row i multiplied by a scalar α is the determinant of A multiplied
by the scalar α:

|C| = α|A|

This is easily seen by expanding the determinant of A by row i, realizing that all
elements of this row are multiplied by α, while the cofactors are left invariant.

Product of a scalar by a matrix The determinant of multiplication of a matrix A by a


scalar α is:

|C| = |αA|
= αn |A|,

where n is the dimension of the square matrix A. This can be seen by induction and
using the previous result.
356 Vectors and Vector Calculus

Matrix with zero row or column The determinant of matrix A with a row (or column)
of zeros is zero. This is easily seen by expanding the determinant by the row (or
column) of zeros.
Matrix with 2 rows (2 columns) interchanged Interchange 2 rows or columns, then the
determinant of the matrix is multiplied by −1. The result can be proved by induc-
tion. We will not prove this result.
Matrix with repeated row (or column) The determinant of a matrix A with repeated row
(or column) is zero. This follows from the previous result, since the matrix has a
repeated row, then, by interchanging these two rows, on the one hand, we do not
change the determinant since the matrix remains unaltered, on the other hand, by
the previous result it changed sign. Hence, it is zero.
Matrix where one row multiplied by scalar is added to another If we replace a row by
the row obtained by adding to it another row multiplied by a scalar, the determi-
nant is invariant. A similar result holds if instead we multiply a column by a scalar
and add the product to another column.

To prove this property, consider that we multiply row i by α and add to row j.
Expand the determinant by row j. Then, in the expansion, the elements of row j are
now ajk + αaik , while the cofactors remain the same; the cofactors do not change.
Hence the determinant is now:
X n
(ajk + αaik )Ajk
k=1
= det(A) + αdet(A1 )
where det(A1 ) is the determinant of a matrix with a repeated row, so it is zero.

A similar result is proved by operating with columns.


Transpose of a matrix The determinant of the transpose AT of a square matrix A equals
the determinant of the original matrix:
T
A = |A|.

We sketch the proof. The proof follows by realizing that the column i of AT is row i
of A and similarly row j of AT is column j of A. So, expanding the determinant of
AT by column i is the same as expanding the determinant of A by row i. Note that
the cofactors of each expansion are the same.
Hermitian of a matrix The determinant of the Hermitian AH of a square matrix A is the
conjugate of the determinant of the original matrix:
A = (detA)∗
H
6.6 Functions of Matrices 357

The proof follows because Hermitian is the transpose conjugate. Transposition does
not alter the determinant by the previous property. So, all that is left is conjugation.
Since the determinant is a sum of products of entries of A, the determinant of the
Hermitian of A is the sum of products of the conjugate entries of A; but this sum is
the conjugate of the sum of the same product of entries of A, and so the determinant
of the Hermitian matrix is the conjugate of the original matrix.

Diagonal matrix The determinant of a diagonal matrix is the product of its diagonal en-
tries. Let D = Diag([d11 · · · dnn ]). Then

|D| = Πni=1 dii .

This can be easily proved by induction.

Triangular matrix Example 6.4.15 introduced upper and lower triangular matrices. It
is easy to show that the determinant of a triangular matrix is the product of its
diagonal entries. We get then that if L is lower triangular its determinant is

|L| = Πni=1 `ii .

where `ii are the diagonal entries of L.

A similar result holds for upper triangular matrices U:

|U| = Πni=1 uii .

where uii are the diagonal entries of U.

These results can be easily proved by induction. We sketch the proof for the lower
triangular matrix L. Expanding the determinant in terms of the last column us-
ing (6.6.5) or (6.6.6), all terms are zero, except the term `nn Lnn where Lnn is the
cofactor of the nn entry of the matrix L. But this cofactor is of order n − 1 and the
corresponding minor is the determinant of a triangular matrix of dimension n − 1.
So, this follows by the induction step.

Product of 2 triangular matrices The determinant of the product of two triangular ma-
trices of the same type (both lower triangular or both upper triangular) is given by
the product of the determinants of each matrix.

|L1 L2 | = |L1 ||L2 |.

We sketch the main steps of the proof for lower triangular matrices. For upper tri-
angular matrices the proof follows similarly.
358 Vectors and Vector Calculus

The product of two lower triangular matrices is lower triangular. The diagonal el-
ements of the product matrix are the product of the corresponding entries of each
lower triangular factor. Then the result follows from the previous result on the de-
terminant of a lower triangular matrix.

Product of two square matrices This result generalizes the previous result to arbitrary
square matrices.

The determinant of the product of two square matrices of the same dimension is
given by the product of the determinants of each matrix.

|AB| = |A||B|.

The proof is easy if we know that we can reduce an arbitrary square matrix to a
triangular matrix by a method that does not change the determinant. This method
is Gauss elimination and will be studied in Chapter 7. So, the proof of the statement
on the determinant of the product of two matrices follows from the result for the
determinant of the product of two triangular matrices, once we reduce each matrix
to a triangular matrix by Gauss elimination.

Inverse of a square matrix Let A be invertible. Then

A = |A|−1 .
−1
(6.6.7)

This follows by realizing that:

A−1 A = I.

Computing the determinant of the product on the left-hand-side gives the product
of the determinants. The determinant of the right-hand-side is the determinant of
the identity matrix. This determinant is the product of the diagonal entries, which
are all one, so it is one. Then:
−1 −1
A A = A |A|
=1

from which:

A = |A|−1 .
−1

Since (6.6.7) expresses the determinant of the inverse matrix A−1 in terms of the
inverse of the determinant of the original matrix A, the existence of the inverse A−1
requires that |A| =6 0, i.e., that A is non singular. Result 6.6.1 shows that this is
actually a necessary and sufficient condition for the inverse of a matrix to exist.
6.6 Functions of Matrices 359

Orthogonal matrix If A is orthogonal

|A| = ±1.

This follows because from the definition of orthogonal matrices

AAT = AT A = I.

Then

|A|2 = 1,

and so

|A| = ±1.

Unitary matrix If A is unitary

|detA| = 1.

In other words, its magnitude1 is one. This follows because from the definition of
unitary matrices

AAH = AH A = I.

Then

det(A)det AH = |det(A)|2


=1

and so

|detA| = 1.

Again, note that | · | represents here the magnitude of the complex number and not
the determinant.

6.6.2 Trace
The trace of a n-dimensional square matrix A is:
n
X
tr A = aii .
i=1

1
Here, | · | refers to the absolute value of the scalar detA. This is why we often prefer the alternative
notation for determinant, namely, det(·).
360 Vectors and Vector Calculus

A number of facts about the trace can be proven easily from the definition.

The trace of the n-dimensional identity is:

tr In = n,

and, of course, the trace of the zero matrix is:

tr 0 = 0.

The trace of the matrix A and the trace of the transpose of A

tr A = tr AT ,

since both matrices have the same diagonal entries.

Likewise, the trace of the matrix A and the trace of its Hermitian AH are
∗
tr AH = tr AT (6.6.8)
= (tr A)∗ , (6.6.9)

the conjugates of each other, since their diagonal entries are conjugate of each other.
The trace is linear:

tr (αA + βB) = αtr A + βtr B.

This can be proven by direct verification.

Of course the trace of a scalar is the scalar itself:

tr (α) = α.

It is easy to prove from the definition of the trace that:

tr baT = tr aT b
 
(6.6.10)
T
= a b,

since aT b is a scalar, and assuming the vectors a and b are of the same dimensions. In
words, (6.6.10) states that the trace of the outer product of two vectors with the same di-
mension (given by the LHS of (6.6.10)) is the scalar product of the vectors (given by the
RHS of (6.6.10)).

Property (6.6.10) for vectors extends to matrices. We get the very interesting property of
the trace:

tr (AB) = tr (BA). (6.6.11)


6.6 Functions of Matrices 361

We assume that A and B have compatible dimensions so both products make sense. This
property can be proved by direct evaluation of the LHS and the RHS. We provide an al-
ternative proof, based on the fact that the trace of the outer product of two vectors with
the same dimensions is the scalar product of the vectors.

Let A and B be given in column format and row format, respectively:


 
g1T
A · B = [f · · · fn ] ·  ... 
 
gnT
n
X
= fi giT .
i=1

By linearity of the trace and the property of the trace of outer product of vectors
n
!
X
tr (A · B) = tr fi giT
i=1
n
X
tr fi giT

=
i=1
n
X
tr giT fi

=
i=1
= tr (BA).
The last step follows directly by computing the diagonal elements of BA with B given in
row format and A given in column format. This proves the result.

6.6.3 Inverse of a Matrix


We recall from Section 6.5.3.7 that the inverse of a n-dimensional square matrix A, when
it exists, is defined as the matrix B such that:
AB = In = BA,
where In is the identity matrix of dimension n. Both equalities need to hold for the matrix
B to be defined as the inverse of the matrix A.

The inverse of a n-dimensional square matrix A may not exist; when it exists, as we saw
below (6.5.48), it is commonly represented by the symbol A−1 . We then have, if the inverse
of A exists,
AA−1 = In = A−1 A.
362 Vectors and Vector Calculus

Remark 6.6.1 (Inverse and unit element of matrix multiplication). As we observed, the
inverse A−1 of the n-dimensional square matrix A when it exists plays the role for the prod-
uct of square matrices that the inverse of a scalar plays with respect to the product of scalars.
Their product is the unit element I of multiplication of square matrices.

There are two main differences between the inverse of a matrix and the matrix product on the
one hand, and the inverse of a scalar and scalar multiplication on the other hand: 1) the scalar
inverse is defined for every nonzero scalar, while the inverse A−1 is defined only for square
matrices A with nonzero determinant; and 2) with the scalar multiplicative inverse, we need
checking a single condition, namely, for the nonzero scalar, the product of the scalar and
its inverse is one, while with the matrix multiplicative inverse, assuming the given square
matrix is invertible, to make sure that a matrix B is the inverse of another matrix A we need
to check two conditions, the left product and the right product. 

There are several ways to compute the inverse of a matrix. We will discuss one here and
consider another method in Chapter 7.

Result 6.6.1 (Inverse of matrix A). Let the n-dimensional square matrix A

A = [aij ], n×n

be invertiblea . The inverse of A is unique and given by

1
A−1 = Adj(A) (6.6.12)
det A
where the Adj(A) is the adjoint of A (sometimes called the adjugate matrix of A, because the
term adjoint of a matrix is also used for another purpose). This matrix is given by:

Adj(A) = [Cij ]T

where [Cij ] is the cofactor matrix, i.e., the matrix of cofactors:

Cij = (−1)(i+j) Aij

Here Aij is the minor of the entry aij of the matrix A.


a
Hence, its det A 6= 0.

Proof I The proof of this result follows by computing the diagonal elements and the
off-diagonal elements of the left and right multiplication of the matrix A and the inverse
given by (6.6.12). We consider the right multiplication only; the left multiplication follows
6.6 Functions of Matrices 363

similarly.

Consider the element (i, j) of the product AA−1 , where A−1 is given by (6.6.12):
n
 −1
 X 1
AA ij = ai` [Adj(A)]`j
`=1
det A
n
1 X
= ai` (−1)j+` Aj`
det A `=1
= δij .

In the last equation δij is the Kronecker symbol



1 if i = j
δij =
0 otherwise.

In the second equation Aj` is the minor associated with element (j, `). The last equation
follows from the definition of the determinant and corresponding properties. In fact, if
j = i, the sum on the right-hand-side is the det A. If j 6= i, then the sum is the expression
of the determinant of a matrix with a repeated row, namely, row i, which is zero. In both
cases we are assuming that det A 6= 0. Clearly, A−1 given by (6.6.12) is unique. 

Result 6.6.2 (Inverse of the matrix A and its determinant). The inverse A−1 of the n-
dimensional square matrix A exists if and only if

det A 6= 0

Proof I The proof of Result 6.6.1 shows that the nonzero condition on the determinant
of the n-dimensional square matrix A is both a necessary and sufficient condition for the
inverse given by (6.6.12) to exist.


Remark 6.6.2 (Inverse and determinant). The nonzero determinant condition for matrices
takes the place of the nonzero condition on a scalar for its inverse to exist. 

It is usually computationally expensive to compute directly through Result 6.6.2 the in-
verse of a matrix; for generic n × n matrices, this is a O(n3 ) operation, i.e., requires on the
order of n3 floating point operations (flops). For example, inverting a 1000 × 1000 matrix
would take on the order of 1 Gflops. While any respectable laptop nowadays can deliver
this performance in about 1 second or a fraction of a second, inverting larger matrices
364 Vectors and Vector Calculus

makes it impractical.

We recall a few facts already known about the determinant of structured matrices and
add a few additional facts about their inverses.

Result 6.6.3 (Inverses of diagonal and triangular matrices). If D is diagonal, U is upper


triangular, and L is lower triangular, then:

D−1 = d−1
 
ii
U is upper triangular, U−1 ii = u−1
−1
 
ii .
−1 −1
 −1 
L is lower triangular, L ii = lii .

In words: The inverses of diagonal, upper triangular, or lower triangular matrices when
they exist, keep their characteristic, i.e., the inverse of a diagonal matrix, upper triangu-
lar matrix, or lower triangular matrix is diagonal, upper triangular, or lower triangular,
respectively, and the diagonal elements of the inverses are the inverses of the correspond-
ing diagonal entries of the original matrix.

Result 6.6.4 (Inverse of transpose of a matrix). If the inverse A−1 exists, then the inverse
of the transpose AT exists and is given by:
−1 T
AT = A−1 .

Proof I The proof follows because since the determinant of the transpose of a matrix AT
is the determinant of the original matrix, then the inverse of AT exists if the inverse of A
exists. Now:
T
AA−1 = I
T
= A−1 AT
The second equation follows from transposing the product of the two matrices in the LHS
of the first equation. Equality of the RHS of the two equations leads to the desired result.

We should also show a similar result for A−1 A. We leave it as an exercise.


T
This shows that AT is the inverse of (A−1 ) , i.e.:
−1 T
AT = A−1 .
In words, the inverse of the transpose is the transpose of the inverse. 
6.7 Problems 365

Result 6.6.5 (Inverse of powers of a matrix). If the inverse A−1 exists, then:
n
(An )−1 = A−1 .

Proof I The proof follows by induction. We just show for A2 :


2
A2 A−1 = AAA−1 A−1
= I.
2
It follows similarly that A2 (A−1 ) = I. 

Result 6.6.6 (Inverse of product of two square matrices). If A−1 and B−1 exist, then:

(AB)−1 = B−1 A−1 .

Proof I First note that if the inverse of each matrix A and B exists, then the inverse of
their product exists since the determinant of the product is the product of the determi-
nants and this product is non zero since each factor is nonzero.

The proof then follows directly:

ABB−1 A−1 = I.

In words, the inverse of the product of two matrices is the product by reverse order of the
inverses of each matrix, assuming that both inverses exist. Again, we should show that
multiplying AB on the left by B−1 A−1 also leads to the identity I.

6.7 Problems
1. Assume the dimensions of the matrices A, B, C, and D are:

A : 19 × 13, B : 19 × 13, C : 13 × 19, D : 19 × 7, and E : m × n,

Give the dimensions of the following. For each, provide a brief justification.

(a) What are the dimensions of F = B + CT .


(b) What are the dimensions of G = −BT
(c) What are the dimensions of H = AT D
366 Vectors and Vector Calculus

T
(d) Let the matrix K be given by K = AT E. Specify the values of m and n for
K to be well defined.

2. Determine a choice of matrices A and C such that the following is an identity:


   
1 2 3 000
4 5 6
 · C = 0 0 0
 
A· 7 8 9 5 0 0
10 11 12 000

3. Consider the matrix A in block format:


    
1 −1 −1 −1 −1 −1 −1 −1
 2 −1   −2 1   2 −1 
 2 1 


 0 −1 −1 −1 −1 1 −1 
 2 −1 −2 1 −2 1 
A=    

 0 1 −1 1 −1 
0 

 −2 1  2 1 

 −1 −1 
0 0 0
−2 1

(a) Determine the third column and the fourth row of A.


(b) What are the dimensions of the zero’s in A.
(c) Is this matrix upper triangular?

4. Consider the NA × MA matrix A = [aij ] and the NB × MB matrix B = [bk` ]. The


dimensions NA , MA , NB , and MB are not necessarily equal and they are all greater
than one. Do the following:

(a) Let:

C = AT + B.

For C to be well defined, what constraints, if any, must be placed on NA , MA ,


NB , and MB .
(b) Compute the elements cij of C in terms of ak` and bmn . Write now A and B as
 
  b 1
A = a1 · · · aMA and B =  · · · 
bNB

Answer the following:


i. Specify the dimensions of ai and bj in terms of the dimensions of A and B.
6.7 Problems 367

ii. Define the matrix

D = A · B.

What are the constraints on NA , MA , NB , and MB , if any, so D is well


defined.
iii. Express D in terms of ai and bj .

5. Consider diagonal matrices D1 and D2 and lower triangular matrices L1 and L2 .


Show that:

(a) The product D1 D2 is a diagonal matrix.


(b) The product L1 L2 is a triangular matrix.
(c) The product D1 L1 and the product L1 D1 are triangular matrices.

6. Consider the upper and lower triangle matrices U and L, respectively:


 
3 1 −2 −1
 0 1 −2 1 
U=  0 0 −2 1 

0 0 0 −1
 
−1 0 0 0
 2 1 0 0
L=  1 0 −2 0 

2 0 1 −3

Determine which if any of the following matrices is upper triangle, lower triangle,
or full matrix. A full matrix is a matrix with no structure, i.e., an arbitrary matrix.

(a) A = LU
(b) B = UL
(c) C = L2
(d) D = U2

7. Let:
 
aH1
 .. 
A = [aij ] =  .  : n × m (6.7.1)
aH
n

B = [bkl ] = [b1 · · · bn ] : m × n (6.7.2)


C = AB = [cpq ] (6.7.3)

(a) Verify that C = AB is well-defined and determine the dimensions of C.


368 Vectors and Vector Calculus

(b) Explain:
i. The entries cpq of C in terms of the entries aij of A and bkl of B.
ii. The entries cpq of C in terms of the products of the vectors aH i and bj .
iii. The columns cq of C in terms of the matrix A and the columns bj of B.
iv. The rows dp of C in terms of the rows aH i of A and the matrix B.
v. Verify (i) through (iv) with the matrices
 
1 2  
1 2 −1
A =  1 −1 , B= . (6.7.4)
−1 1 2
1 3

8. Consider the matrices


 
aH
1  
3 × 3 : A =  aH
2
 and 3 × 2 : B = b1 b2
aH
3

(a) Define the matrix C as


C = AB
What are the dimensions of C. Compute the element C22 of C.
(b) Define the matrix D as
D = B T AT
What are the dimensions of D. Compute the element D21 of D.

9. Now consider the Hermitians AH , BH , CH of the matrices given in (6.7.1), (6.7.2),


and (6.7.3), respectively.

(a) Write explicitly AH in terms of {aij } and {ai }; BH in terms of {bkl } and {bl };
and C in terms of {cpq }, {cp } and {dq }.
(b) Show that
CH = BH AH (6.7.5)
(c) Generalize (6.7.5) to the product of K pairs of matrices
K
Y
C= Ak Bk = AK BK · · · A1 B1 (6.7.6)
k=1

i.e., show
K−1
Y
CH = BH H H H H H
K−k AK−k = B1 A1 · · · BK AK (6.7.7)
k=0

where the dimensions of all the AK and BK are such that all matrix products
are well-defined.
6.7 Problems 369

(d) Simplify (6.7.7) when the matrices AK and BK are all real-valued.
10. Let A and B be n × n matrices. Determine if the following is true or false. Justify
your answers.
(a) (A + B)2 = A2 + 2AB + B2
(b) (A + B)(A − B) = A2 − B2
11. Consider the matrices:
     
214 51 6 000
A =  3 2 1 , B =  9 2 −3 C =  2 3 4 
132 −1 3 7 000
(a) Verify that AC = BC, C 6= 0.
(b) Can you cancel out matrix C? If yes, prove it; if not justify.
12. Let
 
1  
X = 1  and Y = −j 1 − j

1
(a) Compute and find the dimensions of
Z = XY.

(b) Compute and find the dimensions of


W = YX.

(c) Are Z and W symmetric? If yes, justify; if not, say why not.
(d) Are Z and W Hermitean? If yes, justify; if not, say why not.
(e) Compute tr Z and tr W.
13. Let
 
x
x=
y
a 21 b
 
A= 1
2
b c
Compute the quadratic form
Q(x, y) = xT Ax.
Express Q(x, y) using the matrix
X = xxT .
370 Vectors and Vector Calculus

14. In Signal Processing, data, possibly complex valued, is usually collected as N vec-
tors xn of dimension M and then grouped in the matrix

X = [x1 · · · xN ].

Two products of this data matrix arise in many Signal Processing applications. The
Grammian of the data matrix

G = XH X,

and the outer product of the data matrix:

H = XXH .

(a) What are the dimensions of X, G, and H.


(b) Are X, G, and H symmetric?
(c) Are X, G, and H Hermitian?
(d) Calculate the order of the number of floating point multiplications that are
needed to compute G and H. Are these orders the same? If they are, jus-
tify why they are the same. If not, which one is more onerous to compute G or
H? Justify.
(e) If you can, compute the trace of G and the trace of H. If you can compute both,
what is the relation between them.
(f) If you can compute the trace of G and the trace of H, calculate the order of
the number of floating point multiplications that are needed to compute each
trace. Are these orders the same? If they are, justify why they are the same. If
not, which one is more onerous to compute, the trace of G or the trace of H?
Justify.

15. Given a square matrix A, consider the following matrix:

B = PAP−1 .

The matrix B is said to be similar or conjugate to matrix A, or said to be obtained by


a similarity transformation from matrix A. Do the following:

(a) Relate the determinant of A and the determinant of B.


(b) Relate the trace of A and the trace of B.

Remark 6.7.1. These are significant properties and show that the determinant and the
trace are invariant to conjugation or similarity transformations. 
6.7 Problems 371

16. Consider the matrix


 
cos θ sin θ
R(θ) = .
sin θ − cos θ

(a) Compute det R(θ).


(b) Is the matrix invertible? If so, determine R(θ)−1 . If not, say why not.
17. Compute the following:
(a) Determinant of A given by
 T  
−2 0 0 −3 0 0
A =  −1 3 0   0 3 0 
122 002

(b) Determinant of B given by  


1 1 0 1
0 1 0 1
B=
1

0 1 1
1 1 1 1
18. Find the entry (2, 3) of the adj B where B is
 
1 −1 2 1
 0 6 1 2
B=  2 3 −3 5 

−1 5 0 3

19. Consider the matrix C:  


11 0
C =  0 1 −1 
1 0 −1
(a) Find the determinant of D given by
2
D = CT

(b) If matrix C is invertible, find the inverse C−1 of the matrix C, i.e., find

G = C−1

20. Consider the matrix


 
3 −1 −3
 0 2 1 .
−3 0 1
372 Vectors and Vector Calculus

(a) Find det A.


(b) Find the cofactors C2,3 and C3,2 of A.
(c) Compute [A−1 ]3,2 .

21. Consider the diagonal matrix D:

D = Diag[d1 · · · dN ]
 
d1 0 · · · · · · 0
..
 0 d2 · · · 0 .
 

 . . . . ..
= .. .. . . .. .

 .
 .. 0 ... . . .

0 
0 ··· ··· 0 dN

Determine by induction:

(a) The determinant of the diagonal matrix D.


(b) The inverse of D if it exists.

22. Consider the lower triangular matrix L:


 
a11 0 · · · ··· 0
..
a21 a22 ··· 0 .
 
 

L= .. .. .. .. ..

 . . . . .


 .. .. .. .. 
 . . . . 0 
aN 1 ··· · · · aN (N −1) aN N

Determine by induction:

(a) The determinant of L.


(b) The structure of the inverse of L and the elements of the diagonal of this in-
verse, if the inverse of L exists.

23. Determine:

(a) The determinant and the inverse if it exists of the matrix A:


 
123
A =  −2 1 4 
−1 1 1
6.7 Problems 373

(b) The determinant of A, the (3, 2) minor of A, the (4, 2) cofactor of A, and the
element [A−1 ]24 of the inverse if the inverse exists of the matrix:
 
1 −1 0 2
 −2 1 2 −1 
A=  0 2 0 1

1 02 2
24. Given a generic, possibly complex valued, matrix A consider the product
B = AAH
Determine which of the following statements is true, false, or not enough informa-
tion. Provide a short justification for your answer.

(a) |B| = |A| AT
(b) |B| = |A|2

(c) Can not determine |B| in terms of |A| and AT .

(d) Not enough information to determine |B| in terms of |A| and AT .
25. Consider the square matrices A and B. Determine:

|C| = A2 BH

26. Determine the determinant and the inverse of the rotation matrix about the y-axis:
 
cos θ 0 sin θ 0
 0 1 0 0
Ry (θ) =  
 − sin θ 0 cos θ 0 
0 0 0 1
Is this matrix orthogonal? Is this a unitary matrix?
27. Consider the matrix  
1 −5 62 2
 0 −2 13 −1 
A=
 0 0 −1 18 

0 0 0 1
Is this matrix invertible? If not explain why not; if yes, compute the elements [A−1 ]33
and [A−1 ]34 of the inverse.
28. Consider the matrices A and B as well as their transposes AT and BT :
   
1 21 −1 1 1 −1
A =  2 1 0 , B =  1 1 1 −1 .
0 −1 1 1 1 −1 1
Determine
374 Vectors and Vector Calculus

(a) The determinant of A and AT .


(b) Is the matrix A invertible? Is the matrix AT invertible? If either is, or both are,
determine their inverses.
(c) The determinant of BBT and BT B. Is any or both of these matrices invertible?
If not, why not, if yes determine their inverses.

29. Consider the matrix


 
cos θ sin θ
R(θ) = .
sin θ − cos θ

(a) Compute det R(θ).


(b) Is the matrix invertible? If so, determine R(θ)−1 . If not, say why not.

30. Let A be a square matrix. Prove that |A| =


6 0 given that

A2 − A = 2I.

31. Recall that the square matrix A is orthogonal if and only if

AT A = AAT = I.

Let X and Y be orthogonal matrices, Prove or disprove that

Z = XY

is orthogonal.

32. The discrete Fourier matrix F of dimension n is a square n-dimensional matrix with
wide application. It is introduced in Example 6.4.6 and Equation 6.4.15. Multiply-
ing an n-dimensional vector v by F computes the so called discrete Fourier trans-
form (DFT) of v, which is usually represented by the corresponding capital letter V,
although it is a vector. Hence:

V = Fv.

The vector v of dimension n is assumed to collect the n time samples x[k], k =


0, 1, · · · , (n − 1) of a discrete time signal. The corresponding DFT V of v collects the
DFT coefficients at the discrete frequencies Ω` = 2π n
`, ` = 0, 1, · · · , (n−1). Usually, when
dealing with the DFT of v, we number the components of both, the signal v and its
DFT, V, from 0 rather than 1. The first component V0 of the DFT vector V represents
the Fourier coefficient of v at the frequency Ω0 . Similarly, the `th component V` of
the DFT vector V represents the Fourier coefficient of v at the frequency Ω` .
6.7 Problems 375

Below, when you are asked to plot the DFT vector V of a vector v, plot its real part
and its imaginary part as a function of Ω` , i.e., draw two plots, one that plots the
real part of the components of V and the other that plots the imaginary part of the
components of V. Label the horizontal axis of both plots by the values of the fre-
quencies Ω` .

Consider the discrete Fourier matrix of dimension four F4 . Do the following:

(a) Compute V0 , the DFT of the vector v0 = [1 1 1 1]T , i.e., of the constant signal
with amplitude 1. A constant signal is often referred to as a dc-signal. Plot V0 ,
the DFT of v0 . Interpret your solution and your plot.
(b) Compute Vc , the DFT of the vector vc = [1 cos 2π cos 2π 2 cos 2π
  
4 4 4
3 ]T and
plot Vc , the DFT of vc . Interpret your solution and your plot.
(c) Compute Vs , the DFT of the vector vs = [0 sin 2π 2π 2π
  T
4
sin 4
2 sin 4
3 ] and
plot Vs . Interpret your solution and your plot.
(d) Compute V1 , the DFT of the vector v1 = [1 1 − 1 − 1]T . Plot V1 , the DFT V1
of v1 . Interpret your solution and your plot.
(e) Compute the conjugate, the transpose, and the Hermitian of F4 and compare
each with F4 .
(f) Compute the inverse of F4 . Interpret your result.
Chapter 7

Gauss and Gauss-Jordan Elimination

Chapter introduces Gauss elimination and Gauss-Jordan elimination, simple but

T
HIS
fundamental methods to solve linear algebraic systems of equations. Gauss elim-
ination operates with so called elementary operations to reduce, in a first step,
the forward step, the system of equations to a canonical form (upper triangular). When
coupled with a second step, back substitution, it then solves explicitly for the unknowns or
variables in the linear system of equations.

In Gauss-Jordan elimination, the first step is the forward step of Gauss elimination. The
second step further operates with elementary operations to reduce the linear system to a
diagonal canonical form for which the solution of the linear system of algebraic equations
is trivial.

Gauss elimination is very useful. With matrices in triangular form we know that the de-
terminant is the product of the diagonal elements. So, by reducing the original matrix
to a triangular form, the determinant of this triangular matrix is computed simply by
multiplying its scalar diagonal entries. What is interesting is that the determinant of the
original matrix relates easily to the determinant of its reduced triangular form.

Besides the determinant, Gauss and Gauss-Jordan elimination provide a speedy method
to compute the inverse of a matrix and an efficient method to solve a linear system of al-
gebraic equations. But, they have broader applications and are very useful in computing
the rank of a matrix, determining if a set of vectors are linearly dependent or independent,
finding a basis for certain vector subspaces associated with a matrix, among many other
applications. We do not know as yet what some of these concepts are or why they are
important; we will learn about them in subsequent Chapters. But we should have an ap-
preciation for Gauss and Gauss-Jordan elimination–they are simple, effective, and useful.

In the sequel, often, we refer only to Gauss elimination, but similar comments may apply
to Gauss-Jordan, even if not made explicitly.
7.1 Introduction 377

7.1 Introduction
We start by motivating Gauss elimination in the context of solving a set of simultaneous
linear algebraic equations. Consider the equations:
4x1 + 6x2 + 9x3 = 6
6x1 − 2x3 = 20 (7.1.1)
5x1 − 8x2 + x3 = 10.
These are three equations; there are also three unknowns, the variables x1 , x2 , and x3 .
These unknowns are fixed parameters, i.e., they are not functions of time. Hence, there
are no delays or advances or derivatives, so, the equations in (7.1.1) are algebraic. In this
course, unless otherwise stated, we assume that the unknowns x1 , x2 , and x3 are com-
bined linearly by coefficients drawn from the reals. For example, in the first equation
in (7.1.1) the variable x1 is multiplied by the coefficient 4, the second variable x2 by the co-
efficient 6, and, finally, the third variable x3 is multiplied by the coefficient 9. Because only
powers of degree one of the variables appear, the equations in (7.1.1) are linear. There are
also known quantities, the 6, 20, and 10 on the right-hand-side (RHS) of the equations–
these are referred to as independent terms. The independent terms are also drawn from
the reals. A set of such simultaneous equations like (7.1.1) is referred to as a system of lin-
ear algebraic equations, linear systems for short, when their algebraic nature is implicit from
the context.

These linear systems of algebraic equations arise in many applications. In fact, one can
speculate that a good fraction of engineering and science problems involve in some form
or another solving a linear system of algebraic equations. This system may not be ap-
parent in the original formulation, but after appropriate manipulations, the solution may
reduce to solving such a linear system.

In complex engineering applications the number of equations and the number of un-
knowns may be very large. So, the field of scientific computing develops expedite meth-
ods to solve large systems of linear equations. By large, we may mean hundreds of thou-
sands or even more. Of course, in this course, we will focus on concepts, and we will
usually be concerned with much smaller systems, systems with only a few equations
and a few unknowns. But do not be fooled by the apparent simplicity of these systems.
The concepts and methods we learn while studying these simple small systems are very
powerful, and with the help of a computer we can successfully address the much more
realistic Big Data problems of today.

In (7.1.1), we observe that the number of equations and the number of unknowns (or vari-
ables) is the same. This is not necessarily always the case, and we will consider problems
where the number of equations is larger than the number of unknowns, as well as prob-
lems where the number of equations is smaller than the number of unknowns.
378 Gauss and Gauss-Jordan Elimination

When studying ordinary differential and difference equations in Chapters 4 and 5, we


saw that it is important as a preliminary step to rearrange the equations so they are in a
canonical form. A canonical form for a system of equations is not a necessary form to suc-
cessfully solve the system of equations, but it helps in simplifying and systematizing the
solution. So, we will also assume that the system of linear equations is first reorganized
to be in canonical form.

The canonical form of a linear system of algebraic equations has the linear combination of
the unknowns on the left-hand-side (LHS) of the equations and the known independent
terms on the RHS. On the LHS, the unknowns are displayed in all the equations of the
system in the same order, as we move from left to right in each equation. For example,
if in (7.1.1), we order the unknowns as x1 , x2 , and x3 , each equation will display first the
term in x1 , then the term in x2 , and, finally, the term in x3 . If in an equation a term is miss-
ing, like in the second equation the term in x2 is missing, we interpret its coefficient as
being zero and simply move to the next term. With longhand writing of these equations
(possible only when you have a few equations as in this case), it is mnemonic to display
the equations so their structure is preserved, which means we align vertically the terms
by the unknowns. So, if a term is missing, we leave enough blank space so the previous
and next terms are still aligned.

A solution to the system of equations is a set of numerical values such that when we sub-
stitute the unknowns by these values we obtain an identity, i.e., the LHS of each equation
results in exactly the corresponding independent term. There are basically three issues
regarding solving a system of linear equations: 1) is there a solution or solutions to the
system of equations–this is the existence question; 2) if existence is answered affirmatively,
is the solution unique–this is the unicity question; and 3) if there are one or several solu-
tions, can we find them–this is the how to solve question. We will see that Gauss elimina-
tion helps with addressing all three questions.

We now resort to Chapter 6 and rewrite with vectors and matrices a generic system of
linear equations. We illustrate first with the simple example in (7.1.1). The coefficients of
the system are grouped into a matrix A, the unknowns in a vector x, and the independent
terms in a vector b:
 
4 6 9
A =  6 0 −2  (7.1.2)
5 −8 1
 
x1
x = x2 

x3
 
6
b = 20.
10
7.1 Introduction 379

With matrices and vectors, the system (7.1.1) is rewritten as:


Ax = b. (7.1.3)
The coefficient matrix A, the system matrix, has dimensions mA × nA where mA = 3
and nA = 3 in (7.1.1). The unknown vector x has dimension mx = 3, the number of un-
knowns in the system. Finally, the independent vector or term b has dimension mb = 3,
the number of known or independent terms in the system. We emphasize that by the na-
ture of the problem the number of equations equals the number of independent terms, so
mA = mb . On the other hand, the number of columns nA of A equals the number mx of
unknowns, so nA = mx . In general, the number of equations mA is not necessarily equal
to the number of unknowns mx .

From the matrix format (7.1.2) for (7.1.1), it is clear that, if the matrix A is square, i.e., the
number of equations mA equals the number of unknowns mx , and if A is invertible, the
solution to (7.1.1) is conceptually simple and given by:
x = A−1 b. (7.1.4)
The solution (7.1.4) is deceptively simple, because determining if the matrix A is invert-
ible using for example Result 6.6.2 in Chapter 6 requires checking if A is non singular,
i.e., detA 6= 0, a nontrivial task using the direct methods of Chapter 6. After knowing
that A is invertible, solving as in (7.1.4) still requires inverting the matrix A. Finding the
determinant and inverting a (square) matrix of dimension n is in general computationally
a problem of order n3 , i.e., requires on the order of n3 floating point operations. To have a
feeling for what this entails, consider the example when the matrix A is mA = mA = 1000,
i.e., we are interested in solving a system of one thousand equations in one thousand un-
knowns. Inverting a 1000 × 1000 matrix requires on the order of 109 floating point opera-
tions (FLOPS). With a 1 GHertz clock computer that can solve in parallel per clock cycle
one addition and one multiplication, solving such a system will require on the order of
one second. Granted that a computer may be much faster (not in clock cycles but because
of multicores) and so, instead of one second, the faster computer may require only a frac-
tion of a second to invert the matrix and solve the linear system (7.1.3) using (7.1.4). But
applications are commonly also much larger than 103 equations, and so solving a very
large general linear system like in (7.1.4) is expensive. Fortunately, in very large applica-
tions, the equations are sparse, i.e., most entries of A are zero. By exploiting this sparse
structure it is then possible to expedite very considerably the solution of the linear system,
and it is practical to consider systems with many thousands, if not millions, of equations
and unknowns.

Here, we consider Gauss elimination, which is a method that is systematic and that can
be easily programmed in a computer. We motivate the method by first working directly
with solving a linear system of equations and then reformulating this example in terms
of Gauss elimination. This we do in the next Section.
380 Gauss and Gauss-Jordan Elimination

7.2 Gauss and Gauss-Jordan Elimination: Examples


In this Section, we apply Gauss and Gauss-Jordan elimination to solving an example of a
linear system of algebraic equations in Subsection 7.2.1 and then reducing a matrix to ech-
elon form in Subsection 7.2.2–row echelon form by Gauss elimination and reduced row
echelon form by Gauss-Jordan elimination. Gauss elimination applies row operations (or
operations on the equations) to reduce a system of linear equations or a matrix to an up-
per triangular form (row echelon form). Gauss-Jordan starts from the row echelon form
of the system of equations or of the matrix and applies again row (or equations) opera-
tions to reduce it to a diagonal form (reduced echelon form).

In Subsection 7.2.3, we show how these row operations on matrices can be interpreted as
premultiplication by certain simple matrices, the elementary matrices.

This section illustrates by working with examples the technique and concepts in Gauss
and Gauss-Jordan elimination. The method is simple and systematic. In subsequent Sec-
tions, we consider the general methodology of Gauss and Gauss-Jordan elimination and
also discuss shortcuts often taken in practice.

7.2.1 Solving Linear Systems


As discussed in Section 7.1, we introduce Gauss elimination in the context of solving a
linear system of equations. Consider the system of three equations in three unknowns:

x1 + x2 + x3 = 1
2x1 + x2 − x3 = −1 (7.2.1)
x1 + 2x2 − 2x3 = 2

This system is in canonical form since the independent known terms are all on the RHS,
the unknowns are all on the LHS and, in each equation, the unknowns are written in the
same order from left to right.

A solution are three known values, say, α1 , α2 , and α3 such that when we replace x1 = α1 ,
x2 = α2 , and x3 = α3 in (7.2.1) we obtain an identity, i.e., the LHS of each equation is
identical to the RHS of the equation:

α1 + α2 + α3 = 1
2α1 + α2 − α3 = −1
α1 + 2α2 − 2α3 = 2

Linear systems: Elementary operations. To solve (7.2.1), we operate with the equations in one
of three ways: 1) multiply one equation by a nonzero scalar, for example, multiply the sec-
ond equation by 21 ; 2) interchange equations, for example, the first equation with the third
7.2 Gauss and Gauss-Jordan Elimination: Examples 381

equation; 3) replace one equation by the equation obtained by multiplying another equa-
tion by a nonzero scalar and adding the resulting equation to the original equation, for
example, replace the third equation by the result of multiplying the second equation by 12
and subtracting it from the third equation.

We can easily verify that the solution of the original system of equations is not modified
by substituting the original system by the system of equations that results by application
of one or more of these operations. These operations are called elementary operations. By
judiciously applying the elementary operations, we can reduce the system to another sys-
tem in a form that is much easier to solve. Each of these operations is called elimination,
and the systematic process that accomplishes it is called Gaussian elimination.

We illustrate the method by solving (7.2.1).

The goal is to reduce the system of equations to a triangle form. We explain this with an
example and break the solution in successive steps.

1. First pivot. We start by choosing the equation in (7.2.1) for which the coefficient of
the first variable x1 is the largest. In the example we are working on, this is the
second equation. The coefficient of the first variable in the second equation is two,
larger than the coefficients of x1 in the first and third equations. This largest coeffi-
cient becomes the first pivot we choose.

We next interchange the first and second equation, so that the equation with the first
pivot becomes now the first equation. This equation with the first pivot is the first
pivot equation. This interchange of equations is an elementary operation and does
not change the solution of the system of three equations. We get:

2x1 + x2 − x3 = −1
x1 + x2 + x3 = 1 (7.2.2)
x1 + 2x2 − 2x3 = 2

Finally, we divide the first equation by the pivot. Again, this is an elementary oper-
ation and does not change the solution.

This normalizes the coefficient of the first variable, variable x1 , in the now first equa-
tion to be one. The system becomes:

1
..
2
. x1 + 21 x2 − 12 x3 = − 12
.. (7.2.3)
. x1 + x2 + x3 = 1
..
. x1 + 2x2 − 2x3 = 2
382 Gauss and Gauss-Jordan Elimination

The number in parentheses on the most left position of the first equation is to remind
us that the first equation is the result of normalizing this equation by the pivot,
i.e., of multiplying the equation by the inverse 12 of the pivot, which is 2, the first
coefficient of the first equation in system (7.2.2).

2. Elimination. We now use the first equation to zero out the coefficients of the first
unknown x1 in the equations below the first equation, i.e., in the second and third
equations. Because the coefficients of the first unknown x1 are one in both equations,
one and two, we zero out the coefficient in x1 in the second equation by subtracting
the first equation from the second equation and writing the result as the second
equation. The same is done with the third equation. We subtract the first equation
from the third equation and write the result in the third equation. Subtracting two
equations is an elementary operation, so the net result is that the resulting system
still has the same solution as the original system. We obtain the system below:

x1 + 21 x2 − 12 x3 = − 12
1
x + 32 x3 = 32
2 2
(7.2.4)
3
x − 32 x3 = 52
2 2

At the end of this step, the coefficient of x1 in the first equation is one, and the
coefficients of x1 in all equations below the first are zero.

3. Recursion. Note that the second and third equations are now a subsystem of two
equations in two unknowns. The unknown x1 has been eliminated from the second
and third equations. We now ignore the first equation and work with the subsystem
formed by the two equations, the second and third, in two unknowns and repeat the
previous step:

(a) Choose the second pivot by inspecting the coefficients of the second variable x2
in both the second and third equations and choosing the largest. Clearly this is
the coefficient 32 in the third equation. This becomes pivot number two and the
equation the second pivot equation.
(b) Interchange the second and third equations, so that the equation with the sec-
ond pivot is now the second equation.
(c) Normalize the second equation by the second pivot, so that the coefficient of
the second variable in the second equation is 1.
(d) Zero out the coefficients of the second variable in all equations below the sec-
ond (there is only one equation, which is the third equation) by multiplying
the second equation by the coefficient of the second variable x2 of the third
equation and subtracting the resulting equation from the third. The resulting
equation replaces the third equation.
7.2 Gauss and Gauss-Jordan Elimination: Examples 383

We perform these steps to obtain successively the systems below. First we inter-
change the order of the second and third equations:

x1 + 12 x2 − 12 x3 = − 12
3
x − 32 x3 = 52
2 2
(7.2.5)
1
x + 32 x3 = 32
2 2

Next, normalize the second equation by the second pivot:


..
. x1 + 12 x2 − 12 x3 = − 12
2 .. (7.2.6)
x2 − x3 = 35

3
.
.. 1
. x + 32 x3 = 32
2 2

Now zero out the coefficient of the second unknown x2 in the third equation:
..
. x1 + 12 x2 − 12 x3 = − 12
1 .. (7.2.7)
x2 − x3 = 35

2
.
..
. 2x3 = 23

4. We now pick the third pivot. Since we have only one equation left, the third, we
look in this equation for the first unknown whose coefficient is nonzero. Clearly,
this is unknown x3 , whose coefficient is 2. This is the third pivot, and we normalize
this pivot to one, by multiplying the third equation by the inverse of the pivot. The
third equation is the third pivot equation.
..
. x1 + 12 x2 − 12 x3 = − 12
.. (7.2.8)
. x2 − x3 = 53
1 ..
x3 = 31

2
.

This step 4 completes Gauss elimination.

5. Leading coefficients, leading equations, and triangular form. At the end of step 2, the coef-
ficients of x1 below the first equation have been zeroed out. At the end of step 3, the
coefficients of x2 below the second equation have been zeroed out (the coefficient
of x2 in the third equation). The resulting system at the end of step 4 is in (7.2.8)
and has a very distinct form: it is in triangular form, actually upper triangular, and
the leading coefficients in each equation, i.e., the first nonzero coefficient in each
equation as we move from left to right, are all normalized to one. The unknowns
corresponding to the leading coefficient in each equation are referred to as leading
variables or leading unknowns.
384 Gauss and Gauss-Jordan Elimination

We take note of the three pivots found in steps 1, 3a, and this step 4: 2, 32 , and 2,
respectively.

There are now two ways we can proceed. The first is by back substitution, which
is explained next. The second method is Gauss-Jordan elimination that continues
reducing the system to a canonical form that is simple to solve. We consider this
after back substitution.

6. Back substitution. With the system in triangular form it is easy now to find the solu-
tion. We work backwards from the last equation, the third equation. The method is
called back substitution.

We start from the bottom equation, the third equation, whose LHS has only one
variable x3 with coefficient one. So, this third equation solves for the variable x3 . In
the second equation, we move the term in x3 to the RHS. Finally, in the first equation,
we move both the term in x3 and the term in x2 to the RHS. We obtain successively:

x1 = 1 − x2 − x3
x2 =3 − 3x3 (7.2.9)
1
x3 = 3

We now can back substitute the value of x3 from the last equation in the second and
first equations, and the value of x2 determined from the second equation in the first
equation. This determines the values of the three unknowns and solves the system
of three linear equations in (7.2.1). The solution is:

x1 = − 43
x2 = 2 (7.2.10)
x3 = 13

We now consider an alternate method.

7. Gauss-Jordan elimination. Gauss-Jordan elimination continues from the upper trian-


gular form in (7.2.8). It attempts to reduce the LHS to a diagonal form. This is not
always possible as we will see below. But when it is possible, then in each equation
there is only one unknown. Assume for now it is possible. To reduce the LHS to di-
agonal form, start from the bottom equation whose only nonzero coefficient on the
LHS is the coefficient for the unknown x3 and eliminate by elementary operations
the coefficients for x3 in the equations above. The process is repeated then with the
unknown x2 in the second equation. The second equation is used to eliminate the
unknown x2 in the equations above, which in this case is only the first equation.
7.2 Gauss and Gauss-Jordan Elimination: Examples 385

8. We get started by repeating (7.2.8):


x1 + 12 x2 − 12 x3 = − 12
x2 − x3 = 35 (7.2.11)
x3 = 13
We replace the second equation by the result of multiplying the third equation by −1
and subtracting it from the second equation. We also replace the first equation by
the result of multiplying the third equation by − 12 and subtracting it from the first
equation. We get:
..
. x1 + 21 x2 = − 13
.. (7.2.12)
. x2 = 2
.
− 21 (−1) .. x3 = 13


The numbers in parentheses on the left of the third equation represent the scalars by
which we multiply the third equation before subtracting it from the second equation
and then the first equation.

The result of this step is to zero out the coefficients of the unknown x3 in both the
second and first equations. Note that (7.2.12) is the result of elementary operations
applied to the original system.
9. We now operate with the second equation to zero out the coefficient of the un-
known x2 in the first equation. We achieve this by replacing the first equation with
the result of multiplying the second equation by 12 and subtracting it from the first
equation. We get:
..
. x1 = − 34
1 .. (7.2.13)

2
. x2 = 2
..
. x3 = 13
Comparing this with (7.2.10), we confirm that we reached the same solution for the
system of equations (7.2.1). Again this is obtained through elementary operations.

Diagonal system. The system in (7.2.13) has a very special structure, it has a diago-
nal structure: the LHS of the first equation has only the first unknown x1 , the LHS
of the second equation has only the second unknown x2 , and the LHS of the third
equation has only the third unknown x3 . Further, the coefficients of the unknowns
on the LHS are all normalized to 1.

We note that in both the Gauss elimination step, the forward step, and in the back-
ward step of Gauss-Jordan elimination, we have used only elementary operations
and the same (normalized) pivot coefficients.
386 Gauss and Gauss-Jordan Elimination

7.2.2 Reducing Matrices


We now work the same example, but using matrix vector notation. We rewrite (7.2.1) as:
4x1 + 6x2 + 9x3 = 6
6x1 − 2x3 = 20 (7.2.14)
5x1 − 8x2 + x3 = 10

    
4 6 9 x1 6
 6 0 −2   x2  =  20  (7.2.15)
5 −8 1 x3 10
| {z } | {z } | {z }
A x b

We introduce needed notation. We call augmented matrix the matrix A concatenated with
the column vector of independent terms b:
A = [A | b] (7.2.16)
..
 
4 6 9 . 6

=  6 0 −2 ..  (7.2.17)
 . 20 

..
5 −8 1 . 10

We apply to the augmented matrix A the elimination steps in Gauss elimination to reduce
it to an upper triangular form. This follows closely the same steps we illustrated when
solving the linear system (7.2.1). The matrix with upper triangular form obtained at the
end of Gauss elimination is the row echelon form of the matrix we start with.

Once we obtain the row echelon form of A, we can proceed as in the back substitution
step 6, or with the Gauss-Jordan elimination in 6, see Subsection 7.2.1. If we continue with
the Gauss-Jordan elimination, the resulting matrix is the reduced row echelon form.

The elementary operations introduced in Section 7.2.1 in the context of linear system of
algebraic equations are now reinterpreted as operating on the rows (or columns) of a ma-
trix.

Matrix operations: Elementary operations. We can operate with the rows (or columns) of a
matrix in one of three ways: 1) multiply one row by a nonzero scalar, for example, mul-
tiply the second row of a matrix by 12 ; 2) interchange two rows, for example, the first row
with the third row of the matrix; 3) replace one row by the row obtained by multiplying
another row by a nonzero scalar and adding the resulting row to the original row, for
example, replace the third row of a matrix by the result of multiplying the second row by
1
2
and subtracting it from the third row.
7.2 Gauss and Gauss-Jordan Elimination: Examples 387

These elementary operations were stated in terms of rows. We can restate them in terms
of columns, for example, the second elementary operation would be interchange two
columns. In these notes we will work with row elementary operations, unless otherwise
stated.

These matrix elementary operations on the rows or columns of a matrix are herein pre-
sented with the goal of solving a linear system of equations. It turns out that they pre-
serve important properties of a matrix as we will see in due time. Used in the context of
Gauss elimination (reducing the matrix to an upper triangular form) and in the context
of Gauss-Jordan elimination (reducing the matrix to a diagonal form), they provide for
square matrices an alternative way to compute the determinant of the matrix or, in case
the matrix is invertible, inverting the matrix. We will revisit these topics and explain them
in more detail below.

We now proceed with explaining Gauss elimination and Gauss-Jordan elimination.


1. First pivot. We repeat the augmented matrix (7.2.16):
.
 
4 6 9 .. 6
A=
 ..  (7.2.18)
 6 0 −2 .
. 20 
..
5 −8 1 . 10
Our steps mimic the Gaussian elimination steps used in solving the linear system of
equations.

We first identify the largest entry in the first column of A. This is the second entry,
a 6, and it becomes the first pivot. We interchange the first and second rows, so that
the pivot is the first entry of the first column. We get:
.
 
6 0 −2 .. 20
 4 6 9 ... 6 .
 
  (7.2.19)
..
5 −8 1 . 10
We now normalize the entries of the first row so that the first entry is one. We
multiply the row by 61 , the inverse of the pivot:

1 .. 10
1
 
6 1 0 − 3
. 3 
 4 6 9 ... 6 . (7.2.20)
 
.
5 −8 1 .. 10

The factor 16 in parentheses on the left of the matrix indicates that we normalized
the first row in (7.2.19) to obtain the first row in (7.2.20).
388 Gauss and Gauss-Jordan Elimination

2. Elimination. Next we use the first row to zero the remaining entries of the first col-
umn. To zero the first entry of the second row, we multiply the first row by 4,
subtract it from the second row, and replace the second row by the result. We also
zero the first entry of the third row by multiplying the first row by 5, subtracting the
result from the third row and replacing this row by the result:
.
 
(5)(4) 1 0 − 31 .. 10
3 
 0 6 31 ... − 22 .

(7.2.21)
 3 3 
.
0 −8 83 .. − 20
3

3. Recursion. Now, we repeat step 1 but with the submatrix formed by rows two and
three, and by columns two, three, and four. First, we identify the second pivot
by scanning column two, starting from row two and look for the largest entry (in
absolute value). This is entry (3, 2) of the third row. The pivot is now −8. We
exchange rows two and three to obtain:
1 ..
 
10
1 0 −3 . 3
 0 −8 8 ... − 20 .
 
(7.2.22)
 3 3 
.
.. − 22
0 6 31 3 3

4. We normalize row two by multiplying the row by the inverse of the second pivot
− 18 :
.
 
1 0 − 13 .. 10 3 

1  0 1 − 1 ..
. 5 . (7.2.23)
−8  3 6
.
.. − 22
0 6 31
3 3

5. We zero out all entries in the second column, below row two. This is only the entry
(3, 2) in the third row. To achieve this, we multiply row two by 6, subtract it from
row 3, and replace the result in row 3:
.
 
1 0 − 13 .. 10 3 
 0 1 − 1 ...

(6) 5 . (7.2.24)
3 6
.
.. − 37
0 0 37
3 3

6. Row echelon form. We search for the third pivot. This is 37


3
, the element (3, 2). We
normalize the third row by multiplying it by the inverse of the pivot:
.
 
1 0 − 13 .. 10
3 
 0 1 − 1 ... 5 .

(7.2.25)
 3 6 
.
3
( 37 ) 0 0 1 .. −1
7.2 Gauss and Gauss-Jordan Elimination: Examples 389

The resulting matrix is in upper triangle form. This is the end of Gauss elimination,
and the matrix obtained is the row echelon form of A.

We could proceed with the backward step or with the Gauss-Jordan method. We
proceed with Gauss-Jordan.

7. Gauss-Jordan elimination. We use the third row to zero out the entry (2, 3) in the
second row. We multiply the third row by − 31 , subtract it from row two, and replace
it in row two:
.
 
1 0 − 13 .. 10
3 
 0 1 0 ... 1 .

(7.2.26)
 2 
.
(− 13 ) 0 0 1 .. −1

8. Reduced row echelon form. We repeat the previous step but now to zero out the en-
try (1, 3). We multiply again row three by − 13 , subtract it from row one, and replace
the result in row one:
..
 
100. 3
 0 1 0 ... 1 .
 
(7.2.27)
 2
.
(− 13 ) 0 0 1 .. −1

The resulting matrix on the left of the vertical dashed line (the original matrix A) is
now reduced to a matrix with a very special form. In this example, it is the three
dimensional identity matrix. This is the reduced row echelon form of the original aug-
mented matrix A. We come back to it in Subsection 7.3.

The final solution can be read directly as: x1 = 3, x2 = 12 , and x3 = −1.

37
We also take note of the three pivots: 6, −8, and 3
.

7.2.3 Elementary Matrices


We finally, reinterpret Gauss elimination and Gauss-Jordan by pre-multiplication by ele-
mentary matrices. To explain it, we reinterpret the elementary operations. Again, this is
better understood by working an example.

We work with matrix (7.2.21) and the steps in Section 7.2.2.


390 Gauss and Gauss-Jordan Elimination

We consider first Gauss elimination. A first elementary operation is, in particular, in


step 3 in Section 7.2.2 where it is desired to interchange rows two and three. This can be
obtained by pre-multiplying the matrix by an elementary matrix as shown below:

1 .. 1 ..
   
10 10
1 0 − . 1 0 − .
 
100  3 3  3 3 
 0 0 1  0 6 31 ... − 22  =  0 −8 8 ... − 20 .

(7.2.28)
 3 3   3 3 
010 . .
0 −8 83 .. − 20
3
0 6 313
.. − 22
3

A second elementary operation is multiplying a row by a constant, like when normalizing


a row by the inverse of the pivot. For example, in step 3 in Section 7.2.2, we normalize
the second row by multiplying it by − 18 . This can be accomplished by pre-multiplication
of the matrix by the following elementary matrix as illustrated below:

1 .. 1 ..
   
10 10
1 0 − . 1 0 − .
 
1 00  3 3  3 3 
 0 − 1 0  0 −8 8 ... − 20  =  0 1 − 1 ...

8
5  (7.2.29)
 3 3   3 6 
0 01 .
.. − 22 .
.. − 22
0 6 31 3 3
0 6 31
3 3

Finally, the third elementary operation is to multiply a row by a scalar, add it to another
row, and replace the last row by the result. An example is in step 5 in Section 7.2.2 when
we zero out the element in entry (3, 2) by multiplying row two by the scalar 6 and sub-
tracting the result from row three and replacing row three by the result. This is interpreted
as pre-multiplying by an elementary matrix as illustrated below:

1 .. 1 ..
   
10 10
1 0 − . 1 0 − .
 
1 00  3 3  3 3 
 0 1 0  0 1 − 1 ... 1 ..

5 = 
 0 1 −3 .
5 . (7.2.30)
 3 6 6 
0 −6 1 . .
0 6 31 .. − 22
3 3
0 0 37 .. − 37
3 3

Putting it all together, we can go from (7.2.21) to (7.2.24) by:

1 ..
 
10
1 0 − .
   
1 00 1 00 100  3 3 
 0 1 0  0 − 1 0  0 0 1  0 6 31 ... − 22  =
8  3 3 
0 −6 1 0 01 010 8 .. 20
| {z } 0 −8 3 . − 3
EG
.. 10
 
1 0 − 31 . 3

= 0 .. 
5 . (7.2.31)
 1 − 13 . 6
37 .. 37
0 0 3
.−3

This is the result of Gauss elimination. The matrix EG is the product of three elementary
matrices and reduced the matrix in (7.2.21) to the matrix in (7.2.31). We now continue
7.3 Gauss, Gauss-Jordan Elimination: General Case 391

with the second step of Gauss Jordan elimination applied to

.
 
1 0 − 31 .. 10
3 
1 ..

5 .
0
 1 −3 . 6
37 .. 37
0 0 3 .−3

We first normalize the diagonal entry of the third row. Then we need to zero out the entry
(2, 3) of the resulting matrix, and, finally, zero out the entry (3, 1) of the resulting matrix.
We get

.. 10
 
1
0 0 1 0 − 31 . 3
   
10 3
1 00 1
 0 1 0  0 1 31 
0 1 0  .. 
5 =
0 1 − 13 . 6
3
001 0 01 0 0 37 37 .. 37
| {z } 0 0 3
.−3
EGJ
..
 
100. 3
=  0 1 0 ... 1 
 
 . (7.2.32)
2
.
0 0 1 .. −1

This matrix is the reduced row echelon form of the matrix in (7.2.21). This matrix is
the identity matrix on the left of the pointwise line. The matrix EGJ reduced the ma-
trix in (7.2.31), the result of Gauss elimination on matrix (7.2.21), to the matrix in (7.2.32).

Remark 7.2.1 (Operating on columns by elementary operations). These three elemen-


tary operations are obtained by operating on rows and correspond to pre-multiplication by
elementary matrices. If instead, we operated on columns of the matrix, we would be post-
multiplying the matrix by elementary matrices. 

These elementary operations as multiplications by elementary matrices will be formal-


ized for the general case in the next Section.

7.3 Gauss, Gauss-Jordan Elimination: General Case


We presented Gauss elimination and Gauss-Jordan elimination in Sections 7.2.1, 7.2.2,
and 7.2.3 through examples to solve a linear system of equations and as methods to reduce
matrices to row echelon form and reduced row echelon form. In this Section, we consider
the general case. We start by considering the elementary operations, in particular, the
elementary matrices that we saw in the example solved in Section 7.2.3.
392 Gauss and Gauss-Jordan Elimination

7.3.1 Row elementary operations: elementary matrices


Subsection 7.2.3 illustrated how elementary operations on rows of a matrix correspond to
pre-multiplication by elementary matrices. We now define general elementary operations
on rows of the M × N matrix A as the product on the left by elementary matrices.

Elementary operations on the rows of A pre-multiply A by M -dimensional elementary


square matrices E. Elementary operations on the columns of A post-multiply A by N -
dimensional elementary square matrices E that perform what we refer to below as O1 ,
O2 , and O3 elementary operations.

There are three types of elementary matrices. We consider explicitly row operations. The
elementary matrices for column operations are essentially the transpose of the row ele-
mentary matrices. We now list the three elementary operations O1 , O2 , and O3 .
O1 : Product of a row of a matrix by a nonzero scalar. Multiplying row i of a matrix A by
a nonzero scalar αi is obtained by pre-multiplying A by a matrix that is the identity
matrix IM where the ith-row eTi of IM is replaced by αi eTi . The elementary matrix is:
column i (7.3.1)
..
. (7.3.2)
1 0 ··· ··· ··· 0
 
0 1 0 ··· ··· 0
. .. . . .. .. .. 

. . . . .
. .
E= (7.3.3)
0 ··· · · · αi · · · 0  · · · row i

. .. .. .. . . .. 
 .. . . . . .
0 ··· ··· ··· ··· 1

O2 : Interchanging two rows of a matrix. Interchanging rows i and j of a matrix A is ob-


tained by pre-multiplying A by an elementary matrix obtained by interchanging
rows i and j of the identity matrix IM . The elementary matrix is
 
1 0 ··· ··· ··· ··· ··· 0
0 1 0 ··· ··· ··· ··· 0
 .. .. . . .. .. .. .. .. 
 
. . . . . . . .
 0 · · · · · · 0 · · · 1 · · · 0  row i is now row eTj
 
E=  ... ... ... ... . . . ... ... ... 
  (7.3.4)
 
 0 · · · 1 · · · · · · 0 · · · 0  row j is now eT
  i
. . . . . . . .
 .. .. .. .. .. .. . . .. 
0 ··· ··· ··· ··· ··· ··· 1
This is a permutation matrix–it permutes rows i and j when multiplying on the left.
If it multiplies on the right, the permutation matrix permutes the columns.
7.3 Gauss, Gauss-Jordan Elimination: General Case 393

O3 : Replacing a row by its linear combination with another row. Replace row j by the
row obtained by multiplying row i by a nonzero scalar αij and adding the resulting
row to row j. The elementary matrix that accomplishes this is:

column i
..
 . 
1 0 ··· ··· ··· ··· 0
0 1 0 ··· ··· ··· 0
 .. .. . . .. .. .. .. 
 
. . . . . . .
. .. .. . . .. .. .. 
 ..
E= . . . . . .  (7.3.5)
· · · αij · · · 1 ···  · · · row j
0 0

. .. .. .. .. .. .. 
 .. . . . . . .
0 ··· ··· ··· ··· ··· 1

If we introduce the matrix

column i
..
 . 
0 0 ··· ··· ··· ··· 0
0 0 0 ··· ··· ··· 0
 .. .. . . .. .. .. .. 
 
. . . . . . .
. .. .. . . .. .. .. 
 ..
Eij =  . . . . . .  (7.3.6)
··· 1 ··· 0 ···  · · · row j
0 0

. .. .. .. .. .. .. 
 .. . . . . . .
0 ··· ··· ··· ··· ··· 0

Then, the elementary matrix in (7.3.5) is:

E = I + αij Eij . (7.3.7)

7.3.1.1 Products of elementary matrices


We note that the matrix for the elementary operation O1 is a diagonal elementary matrix,
while for O3 it is a lower triangular matrix. If we multiply the elementary matrices for
these two row operations O1 and O3 the resulting matrix is then lower triangular. On the
other hand, if the elementary operation O2 is involved, then the associated elementary
matrix is not triangular and so any product of elementary matrices where one of them
corresponds to an interchange of rows will not be lower triangular. Because of this, most
of the time, we will use as much as possible only O1 and O3 , avoiding using O2 .
394 Gauss and Gauss-Jordan Elimination

7.3.1.2 Inverses of elementary matrices


The inverse of an elementary matrix is also an elementary matrix, as we now determine.
1. Inverse of O1 product of a row by a nonzero scalar. The inverse of O1 is another O1
elementary operation where the scalar is now the inverse of the nonzero scalar. The
matrix representing this elementary operation is still of the type (7.3.1), except that
the scalar αi 6= 0 in (7.3.1) is replaced by the scalar α1i :

column i (7.3.8)
..
. (7.3.9)
1 0 ··· ··· ··· 0
 
0 1 0 ··· ··· 0
. .. . . .. .. .. 

. . . . .
. .
EO1 = (7.3.10)
0 ··· · · · αi · · · 0  row i

. .. .. .. . . .. 
 .. . . . . .
0 ··· ··· ··· ··· 1
(7.3.11)
=⇒ (7.3.12)
(7.3.13)
column i (7.3.14)
..
. (7.3.15)
1 0 ··· ··· ··· 0
 
0 1 0 ··· ··· 0
. .. . . .. .. .. 

 .. . . . . .
E−1 = (7.3.16)

O1
0 ··· · · · α1i · · · 0  row i

. .. .. .. . . .. 
 .. . . . . .
0 ··· ··· ··· ··· 1

2. Inverse of O2 interchange of two rows of a matrix. The inverse of O2 is the same opera-
tion, i.e., the inverse of the interchange of rows i and j is the interchange of rows i
and j. The matrix representing this elementary operation is exactly the same as
in (7.3.4):

E−1
O2 = EO2 (7.3.17)

3. Inverse of O3 replacing a row by the linear combination with another row. The inverse
of O3 is another elementary operation O3 where, instead of using the scalar αij as
in (7.3.5), we use −αij , the negative of the scalar. The matrix representing this in-
verse elementary operation is the same as in (7.3.5) where αij is replaced by −αij .
7.3 Gauss, Gauss-Jordan Elimination: General Case 395

Using (7.3.7) as the representation for this elemenatry matrix, the inverse is, when
i 6= j:

E−1
O3 = I − αij Eij , (7.3.18)

where we recall Eij is a zero matrix, except entry (i, j) that is 1. The matrix in (7.3.18)
is the inverse of the matrix in (7.3.7) as can easily be verified since E2ij = 0, unless
i = j.

7.3.2 Row and reduced row echelon forms


We define the row echelon form of a matrix and the reduced row echelon form of a matrix.
These matrices are obtained by row operations on a matrix.

The leading coefficient of a row is the first nonzero entry of the row, as we scan the row
from left to right.

Definition 7.3.1 (Row echelon form). A M × N matrix A is in row echelon form iff:

1. All nonzero rows, i.e., rows where at least one element is different of zero, precede any
zero row, i.e., rows where all entries are zero.

2. The leading coefficient of a nonzero row is 1.

3. The leading coefficient of a row is to the right of the leading coefficient of the row above
it.

Note that in a matrix in row echelon form all the zero rows are at the bottom below the
nonzero rows, and the entries in a column below an entry that is a leading coefficient are
zero.

We now consider the reduced row echelon form.

Definition 7.3.2 (Reduced row echelon form). A M × N matrix A is in reduced row


echelon form iff:

1. It is in row echelon form.

2. The entries in a column above a leading entry are zero.

While a matrix in row echelon form has zeros below a leading coefficient, a matrix in re-
duced row echelon form has zeros below and above a leading coefficient. Also, if a column
396 Gauss and Gauss-Jordan Elimination

has no leading coefficients, then its entries may be zero or nonzero. They can be nonzero
only if they are on a nonzero row and then to the right of the leading coefficient in this
row. In other words, a column with no leading coefficients may be a column of zeros, i.e.,
a zero column, see the second column of the matrix B1 in Example 7.3.2.

Remark 7.3.1 (Column echelon form). We have introduced in Definition 7.3.1 the row
echelon form. If we operate on columns, we obtain the column echelon form. A matrix is in
column echelon form, if its transpose is in row echelon form.

Example 7.3.1 I Row echelon form matrices


We present examples of matrices in row echelon form and examples of matrices that are
not in row echelon form. The matrices below are in row echelon form:
 
1 1 3 −1 1
A1 =  0 0 0 1 2 
000 00
 
1 1 1 3 −1 1
0 0 1 0 0 0
A2 =   0 0 0 0 1 2 .

0000 00
Note that A1 and A2 have a zero row at the bottom. The leading coefficient in the nonzero
rows is 1. The leading coefficient in a nonzero row is to the right of the leading coefficients
of the rows above. For example, in A2 the leading coefficient in row 3 is the entry (3, 5),
which is to the right of the leading coefficients (1, 1) and (2, 3) in rows one and two.

A third example of a matrix in row echelon form is given by:


 
1 0 1 3 −1 1
0 0 1 0 0 0
A3 = 
0 0 0 0 1 2

0000 00
This matrix A3 modifies matrix A2 because entry (1, 2) is zero. The second column of the
matrix is zero. Still all leading rows are above the zero rows and the leading entry of a
nonzero row is to the right of the leading entry of the row above.

Examples of matrices not in row echelon form are given now.


 
0 1 3 −1 1
A4 =  0 0 0 1 2 
100 00
7.3 Gauss, Gauss-Jordan Elimination: General Case 397

This matrix A4 is not in row echelon form since the leading coefficient of row 3 is to the
left of the leading coefficient of the row above it (and actually, also of the leading coef-
ficient of the first row). In this case, a row echelon form could be obtained by circularly
shifting the rows, i.e., by interchanging the rows such that row 3 becomes row 1, row 1
becomes row 2, and, finally, row 2 becomes row 3.

Now consider the matrix;


 
1 0 1 3 −1 1
0 0 1 0 0 0
A5 = 
0

0 0 0 0 0
0 0 0 0 12

This matrix fails being in row echelon form since a row of zeros precedes a nonzero row.
This is simple to fix by interchanging rows three and four. 

In the above examples, we did normalize the leading coefficients to 1 and used when
necessary the elementary operation of row interchange.

Example 7.3.2 I Reduced row echelon form matrices


We present examples of matrices in reduced row echelon form and of matrices not in re-
duced row echelon form. The matrix A1 is in row echelon form but not in reduced row
echelon form, since the entry (1, 4) above the leading term (2, 4) in row two is not zero.
Similarly, the matrix A2 is in row echelon form but not in reduced row echelon form, since
the entry (1, 3) above the leading entry (2, 3) in row two is not zero, and the entry (1, 5)
above the leading entry (3, 5) in row three is not zero. Similar comments apply to A3 ,
which is in row echelon form but not in reduced row echelon form. Of course, matri-
ces A4 and A5 are not in reduced row echelon form since they are not in row echelon
form to start with.

The matrices below are in reduced row echelon form:


 
1001
B1 =  0 0 1 2 
0000
 
100301
0 1 1 0 0 0
B2 = 
 0 0 0 0 1 2 .

000000

Both, matrices B1 and B2 are in row echelon form and all the entries of a column above a
leading coefficient of a row are zero.
398 Gauss and Gauss-Jordan Elimination

7.3.2.1 Row echelon form: Structure

We discuss here the general structure of the row echelon form of a matrix. We assume
that, if there is a column of zeros in the original matrix, it is not the first column.

From the definition and the examples worked out, including A1 through A3 , the general
structure of the row echelon form of a matrix is given by:

1 ∗ ··· ··· ∗ 
 
 ... .. 

∗ .

.. r rows

.
0 ..
 
 . 
A= (7.3.19)
 
 ∗ ··· ∗ 
 
 
 0  n − r rows

We comment on the structure of the row echelon form (7.3.19) of A.

First note that in the row echelon form, if there are rows of zeros they are at the bottom.
These are indicated by the n − r bottom rows in (7.3.19) that are zero.

On the top r rows, we distinguish a trapezoidal structure where ∗ indicates don’t care en-
tries (they can take any value as long as in each leading row the leading entry is nonzero)
to the left of which there is a triangular shaped zero block. The diagonal bounds and
belongs to the trapezoidal block. It starts from the entry (1, 1). Unless A = 0, entry (1, 1)
is the first pivot. This pivot is assumed to have been normalized to 1. This diagonal is a
45o diagonal, extending to the entry (r, r).

The top r rows are the leading rows; each of these rows has a leading entry that is a 1
(assuming we normalized all the pivots to one), and each leading entry is to the right of
the leading entry of the row above. So, the entries of the 45o diagonal may be zero, except
the first entry (1, 1) that has to be nonzero. The structure of the row echelon form of A
can then be simply represented as:
 
U
A =⇒  · · · · · . (7.3.20)
0

Although the block U is trapezoidal with r rows, we still refer to it as upper triangular.
Its r rows are the r leading rows of the row echelon form. We will refer to U as an upper
triangular block.
7.3 Gauss, Gauss-Jordan Elimination: General Case 399

7.3.2.2 Reduced row echelon form: Structure


We discuss here the general structure of the reduced row echelon form of a matrix. We
assume that, if there is a column of zeros in the original matrix, it is not the first column.

The structure follows from the structure of the row echelon form in (7.3.19) or (7.3.20).
The block matrix U in (7.3.20) is upper triangular, but its diagonal, except for the first
entry, may have zeros (if normalized, the leading entry of the first row, i.e., (1, 1) is a 1),
and each leading entry in a row of U is to the right of the leading entry of the row above
it. After Gauss-Jordan, all entries above a leading entry of U are zeroed out. The block U
is reduced to a block that has a “jagged” diagonal. We represent this block by D.

We discuss further the structure of a reduced row echelon form with respect to the fol-
lowing example.
 
12010
A =⇒ U =⇒ D =  0 0 1 1 0 .
00001
The D block in this case is the 3 × 5 matrix, since there are no zero rows. Note that D is
not a diagonal matrix. Not all elements in the upper block of D are zero as illustrated in
the example above. There may be entries to the right of the leading element of a leading
row that are not zero, see rows one (elements (1, 2) and (1, 4)) and two (entry (2, 4)). If a
diagonal entry is zero, then all subsequent diagonal entries are zero (diagonal entry (2, 2)
is zero, so diagonal entry (3, 3) is also zero), and then the leading elements in the leading
rows are to the right of the diagonal element of that row (for example, in row two, the
leading element is (2, 3), to the right of the diagonal entry (2, 2), and, similarly, in row
three, the leading element is (3, 5), to the right of the diagonal entry (3, 3)). With this in
mind, the general structure of a reduced row echelon form is:
· · ··· ··· ·
 
 ... .. 
 .
 0 ...
 .. 
.
A= (7.3.21)
 
 · · · · · 

 
 
 0 

 
D
= (7.3.22)
0

7.3.3 Gauss, Gauss-Jordan elimination: Pseudo-code


We present pseudocode that describes Gauss elimination and Gauss-Jordan elimination.
For the purpose of Algorithms 1 and 2 below, we introduce needed notation. This is not
400 Gauss and Gauss-Jordan Elimination

universally accepted notation; as we say, it serves only the purpose of describing the two
Algorithms 1 and 2. Let the matrix A be M × N .

Let Aij be a subblock of A with all entries Ak` with i ≤ k ≤ M and j ≤ ` ≤ N . For
example A11 = A and AM N = AM N .

Algorithm 1 Gauss elimination


1. Let l = 0.
2. Scan rows to find zero rows. If found, move all to bottom of all nonzero rows.
3. Let l = l + 1. Pivot l: Scan column 1 of the matrix A from entry (1, 1) to entry (M, 1)
to identify the entry with largest absolute value. Fix the largest non zero absolute
value entry as the first pivot. Let the pivot be at (i, 1).
4. If all entries in the first column are zero, relabel the columns 2 to N to 1 to N − 1.
Let N ← N − 1.
5. Interchange by elementary operation first equation with row i where the pivot is.
6. Normalize row 1 by the inverse of the pivot.
7. Zero out all elements (i, 1), 2 ≤ i ≤ M , in column 1 below the first entry (1, 1) by
elementary operations.
8. Repeat with the subblock matrix A2,2 steps 2 through 7, till you reach subblock AM j
for some j ≤ N , or you reach a row of zeros.

Algorithm 2 Gauss-Jordan elimination


1. Perform Gauss elimination by Algorithm 1.
2. Identify the bottom nonzero row and the leading coefficient.
3. Zero out by elementary operations all nonzero entries above this leading coefficient.
4. Move to the next row above the first and locate the next leading coefficient.
5. Repeat step 3 and step 4.
6. If next row is the first row repeat steps 3 and 4 and stop.

7.3.4 Shortcuts to Gauss, Gauss-Jordan elimination


Not withstanding the definition 7.3.1 of row echelon form in Subsection 7.3.2, sometimes
the row echelon form is taken to be slightly different from the form in definition 7.3.1,
and Gauss elimination and, consequently, Gauss-Jordan elimination, proceed in slightly
different ways than how we presented them in Sections 7.2.1 and 7.3, and in algorithms 1
and 2. These liberties to definition 7.3.1 and to Gauss elimination are taken to simplify
the arithmetic or to address issues that we will consider in subsequent Chapters.

We consider now these shortcuts to Gauss and Gauss-Jordan elimination. They avoid two
of the steps introduced in Gauss elimination: exchanging rows and normalizing pivots to
7.4 Gauss Elimination: Applications 401

one. We discuss them now.

7.3.4.1 No row exchange

We indicated that when scanning the entries of a column to search for the pivot, i.e., the
largest in absolute value entry of the column, we may need to interchange the rows of
the matrix, i.e., perform elementary operation O2 . Say, if scanning the first column, the
largest in absolute value entry is (3, 1), we interchange rows 1 and 3 so that the pivot is
this entry (3, 1).

By Subsection 7.3.1.1, when exchanging rows, the matrix representing this elementary op-
eration is not triangular, and so the product of elementary matrices involved in Gauss or
Gauss-Jordan elimination is not a triangular matrix. For reasons that will become clearer
later, we may want to preserve the triangular nature of the product of these elementary
matrices, and so Gauss elimination is sometimes simplified to not include this step.

7.3.4.2 No normalization by pivots

The second shortcut has to do with the normalization of a leading row (i.e., a row that
has a leading coefficient or pivot) by the inverse of the pivot. Often times, this leads to
fractions and complicates unnecessarily the arithmetic of the successive steps of Gauss
elimination. To avoid this, the normalization is avoided, and we keep the pivots as they
are, or at least do not normalize them to 1. The resulting row echelon form does not have
leading coefficients of 1.

When this is the case, when performing Gauss-Jordan, we do need then to normalize the
leading coefficients of the row echelon form to 1 before proceeding.

The net result is that the row echelon form of a matrix may not be unique–the final form
obtained may depend on the exact steps taken and the order in which they are taken. The
reduced row echelon form, except for normalization by the pivots, is always unique.

7.4 Gauss Elimination: Applications


We consider in this Section two applications of Gauss elimination. In Subsection 7.4.1, we
determine conditions for the solution of a linear system of algebraic linear equations and
in Subsection 7.4.2 we present the LU decomposition, also called the LU factorization, of
a square matrix. In subsequent Chapters, we will consider several other very relevant
uses of Gauss elimination and Gauss-Jordan elimination.
402 Gauss and Gauss-Jordan Elimination

7.4.1 Linear systems: Conditions for solution


We first illustrate these conditions through examples and then consider general condi-
tions.

7.4.1.1 Conditions for solution: Examples

We consider several examples that illustrate the different possibilities that may arise when
solving a linear system of algebraic linear equations: 1) a solution exists and is unique;
2) a solution exists but is not unique; and 3) a solution does not exist.

Given a linear system, we illustrate with examples how Gauss elimination provides a
simple way to determine which of these three cases actually occurs. When the solution
exists, cases 1) and 2), then either back substitution or Gauss-Jordan will then lead to the
unique solution, case 1), or to the family of solutions, case 2). When the solution does not
exist, case 3), then we stop the computation at the end of Gauss elimination, and there is
no back substitution, nor Gauss-Jordan step.

Example 7.4.1 I Linear system: unique solution


The first example we consider is the following linear system of four equations in two
unknowns:

x1 − 2x2 = −2
5x1 − 2x2 = 4
(7.4.1)
3x1 + 2x2 = 8
2x1 − 4x2 = −4.

We write the system in matrix form:


   
1 −2   −2
 5 −2  x1  4
 3 2  x2 =  8  .
    (7.4.2)
2 −4 −4
| {z } | {z } |{z} | {z }
A x = b

Equation (7.4.2) identifies the system matrix A, the vector of unknowns x, and the vector
of independent terms b.

To apply Gauss elimination, we need the augmented matrix:

A = [A | b] (7.4.3)
7.4 Gauss Elimination: Applications 403

..
 
1 −2 . −2
 .. 
 5 −2 . 4
 
= .. . (7.4.4)
3 2
 . 8 
..
2 −4 . −4

We now apply Gauss elimination to reduce A to triangular form. We will apply one of
the shortcuts described in Subsection 7.3.4, avoiding the elementary operation O2 , i.e., we
will not interchange rows.

We obtain successively the following.


.. .. ..
     
1 −2 . −2 1 −2 . −2 1 −2 . −2
 ..   ..   .. 
 5 −2 . 4   0 8 . 14   0 8 . 14 
     
 =⇒   =⇒ 
 3 2 ... 8   |{z}  3 2 ... 8   |{z}  0 8 ... 14 
 
  1 0 0 0   1 0 0 0  
..  −5 1 0  ..  0 1 0  ..
2 −4 . −4 0 0 1 0
0 0 0 1
2 −4 . −4 −3 0 1 0
0 0 0 1
2 −4 . −4
. .
   
1 −2 .. −2 1 −2 .. −2
 .   . 
 0 8 .. 14   0 1 .. 47 
   
=⇒ .. =⇒
  |{z}  0 8 ... 14 
   
.
|{z} 

1 0 0 0  0 8 14  1 10 0 0  
 0 1 0  .. 0 8 0  ..
0 0 1 0
−2 0 0 1
0 0. 0 0 0 1
0 0 0 1
0 0 0. 0
..
 
1 −2 . −2
 .. 7 
0 1 . 4 
 
=⇒  . .
0 .. 0 
|{z}
0 0 0 0
 
1 
0 1 0  .
0 −8 1 0
0 0 0 1
0 0 .. 0

Gauss elimination proceeds from left to right and top to bottom. Under each arrow, we
indicate the elementary matrix that multiplies (on the left) the matrix to the left of the
arrow to obtain the matrix on the right of the arrow.

We applied Gauss elimination in a simplified way. In the first column, entry (2, 1) is the
largest in absolute value, so we should have exchanged rows one and two, but, as dis-
cussed in Section 7.3.4, we can choose not to apply this step; here, we choose not to apply
it. This means the first pivot is the entry (1, 1), which is a one.

The first elementary matrix is a O3 elementary matrix and subtracts five times the first
row to the second row. The second elementary matrix is also a O3 elementary matrix and
subtracts three times the first row to the third row. The third elementary matrix is also
a O3 elementary matrix and subtracts two times the first row to the fourth row. These
404 Gauss and Gauss-Jordan Elimination

three operations have zeroed out all entries in the first column below the leading coeffi-
cient of the first equation. It is interesting to note that after the third elementary operation
the fourth row is identically zero.

We now repeat these steps but with the subblock A2,2 . Next, pick as pivot entry (2, 2).
The pivot is 8. The fourth elementary matrix is a O1 elementary matrix and normalizes
the second row by one over this pivot. We now use this second row to zero out all entries
in the second column below the leading entry of the second row. Since entry (4, 2) is al-
ready zero, the only entry left is (3, 2).

The fifth elementary matrix is again a O3 elementary matrix and subtracts eight times the
second row to the third row.

We now should repeat the previous steps with the subblock A3,3 . Scanning the third row,
there is no nonzero entry. Since the fourth row is also a zero row, Gauss elimination has
terminated.

The bottom matrix is in row echelon form as can be verified: there are two leading rows,
the leading entries are normalized to one, and the zero rows are at the bottom of the ma-
trix.

The solution to the original system of equations is x2 = 74 and x1 = −2 + 2 × 74 = 23 , as


can be obtained by back substitution or performing one step of Gauss-Jordan elimination.
The solution exists and is unique.

If we inspect briefly the row echelon form of the augmented matrix:

..
 
1 −2 . −2
 .. 7 
0 1 . 4
 
 .. ,
0 0
 . 0 
..
0 0 . 0

we see that from the original four equations two are zero, and the two remaining equa-
tions could be solved for the two unknown variables x1 and x2 .

We note that, while finding the row echelon form of the augmented matrix A, we also
found the row echelon form of the system matrix A. The row echelon form of the system
matrix A is, by inspection, the matrix to the left of the dashed column of the row echelon
7.4 Gauss Elimination: Applications 405

form of the augmented matrix, i.e.:


 
1 −2
0 1
 0 0 ,
 

0 0
With respect to the structure of the row echelon form as given by (7.3.19), we have that the
block U in (7.3.19) is for the original system matrix and for the augmented matrix given
by:
 
1 −2
UA = : 2×2
0 1
 
1 −2 −2
UA = : 2 × 3.
0 1 47
If we compare the number of leading rows rA in the row echelon form of the system
matrix A and the number rA of leading rows in the row echelon form of the augmented
matrix A, we conclude that they are the same:
rA = rA = 2. (7.4.5)
This is the condition for the existence of the solution, as the next example will also show.

Also, we see that the number of leading rows rA in the row echelon form of the system
matrix A equals the number of unknowns rx in the system equations:
rA = rx = 2. (7.4.6)
This is the condition for the solution when it exists to be unique.


Example 7.4.2 I Linear system: no solution


The second example considers a slight modification of (7.4.1), namely, the following linear
system of four equations in two unknowns:
x1 − 2x2 = −2
5x1 − 2x2 = 4
(7.4.7)
3x1 + 2x2 = 8
2x1 − 4x2 = −3.
Again, we write the system in matrix form:
   
1 −2   −2
 5 −2  x1  4
 
 3 2  x2 =  8.
  (7.4.8)
2 −4 −3
| {z } | {z } |{z} | {z }
A x = b
406 Gauss and Gauss-Jordan Elimination

Equation (7.4.8) identifies the system matrix A, the vector of unknowns x, and the vector
of independent terms b.

The augmented matrix is now:

A = [A | b] (7.4.9)
..
 
1 −2 . −2
 .. 
 5 −2 . 4
 
= .. . (7.4.10)
3 2
 . 8

..
2 −4 . −3

We apply Gauss elimination to reduce A to triangular form. We avoid the elementary


operation O2 of row interchange as much as we can. We will see that we do need to apply
it at some point.

We obtain successively the following.

. . .
     
1 −2 .. −2 1 −2 .. −2 1 −2 .. −2
 ..   ..   .. 
5 −2 . 4 0 8 . 14 0 8 . 14
     
=⇒ =⇒
     
 3 2 ... 8   |{z}  3 2 ... 8   |{z}  0 8 ... 14 
     
  1000   1000  
..  −5 1 0  ..  0 1 0  ..
2 −4 . −3 0 0 1 0
0 0 0 1
2 −4 . −3 −3 0 1 0
0 0 0 1
2 −4 . −3
.. ..
   
1 −2 . −2 1 −2 . −2
 ..   .. 7 
 0 8 . 14  0 1 . 4 
   
=⇒  . =⇒  ..
8 .. 14 
 |{z} 
  1 10 0 0  0 8 . 14 
|{z}
1 0 0 0 0
   
0 1 0 . 0 0 .
0 0 .. 1 0 0 .. 1
   8 
0 0 1 0 0 0 1 0
−2 0 0 1 0 0 0 1

. .
   
1 −2 .. −2 1 −2 .. −2
 .   . 
 0 1 .. 47   0 1 .. 47 
   
=⇒ .. =⇒ .
  |{z}  0 0 ... 1 
  
.
|{z} 

1 0 0 0  0 0 0  1000  
0 1 0  .. 0 1 0  ..
0 −8 1 0
0 0 0 1
0 0. 1 0 0 0 1
0 0 1 0
0 0. 0

The first five steps repeat the steps in Example 7.4.1, but note that after step 3 in the re-
sulting matrix, the fourth matrix, the fourth row is not identically zero, but entry (4, 3) is
now a one. After the fifth step, the resulting matrix, the sixth matrix (matrix before last)
there is a zero row that is not below all leading rows. So, the sixth step, the last step in this
case, exchanges rows three and row four, so that the zero row is below all leading rows.
7.4 Gauss Elimination: Applications 407

We inspect the row echelon form, the last matrix obtained at the end of Gauss elimination:
..
 
1 −2 . −2
 .. 7 
0 1 . 4 
 
.
 0 0 ... 1 

 
..
0 0. 0

If we write explicitly the equation corresponding to the third row of the row echelon form,
it is

0x1 + 0x2 = 1.

Clearly there are no possible values of x1 and x2 that can satisfy this equation. The origi-
nal system has no solution.

With respect to the structure of the row echelon form as given by (7.3.19), we have that the
block U in (7.3.19) is for the original system matrix and for the augmented matrix given
by:
 
1 −2
UA = : 2×2
0 1
 
1 −2 −2
UA =  0 1 47  : 3 × 3.
0 0 1

In this example the number of leading equations rA in the row echelon form of the sys-
tem matrix A and the number rA of leading equations in the row echelon form of the
augmented matrix A are not the same; in fact:

2 = rA < rA = 3. (7.4.11)

We consider a third example that, again, is a slight modification of Example 7.4.1.

Example 7.4.3 I Linear system: family of solutions


The third example solves the following linear system of four equations in two unknowns:

x1 − 2x2 = − 2
5x1 − 10x2 = −10
(7.4.12)
3x1 − 6x2 = − 6
2x1 − 4x2 = − 4.
408 Gauss and Gauss-Jordan Elimination

We write the system in matrix form:


   
1− 2   − 2
 5 −10  x1 −10
 
 3 − 6  x2 = − 6 .
  (7.4.13)
2− 4 − 4
| {z } | {z } |{z} | {z }
A x = b

Equation (7.4.13) identifies the system matrix A, the vector of unknowns x, and the vector
of independent terms b.

To apply Gauss elimination, we again need the augmented matrix:

A = [A | b] (7.4.14)
..
 
1− 2 . − 2
 .. 
 5 −10 . −10 
 
= .. . (7.4.15)
3 − 6 . − 6
 
..
2− 4 . − 4

We now apply Gauss elimination to reduce A to triangular form.

We obtain successively:
.. .. ..
     
1− 2.− 2 1 −2 . −2 1 −2 . −2
 ..   ..   .. 
 5 −10 . −10  0 0 . 0 0 0 . 0
     
=⇒ =⇒
 3 − 6 ... − 6   |{z}  3 −6 ... −6   |{z}  0 0 ... 0 
     
  1000   1000  
..  −5 1 0  ..  0 1 0  ..
2− 4.− 4 0 0 1 0
0 0 0 1
2 −4 . −4 −3 0 1 0
0 0 0 1
2 −4 . −4
..
 
1 −2 . −2
 . 
 0 0 .. 0 
 
=⇒  . .
0 .. 0 
|{z}
1 0 0 0 0
 

 0 1 0  ..
0 0 1 0
−2 0 0 1
0 0. 0

The row echelon form of the augmented matrix is


.
 
1 −2 .. −2
 . 
 0 0 .. 0 
 
A⇒ .
 0 0 ... 0 
 
..
0 0. 0
7.4 Gauss Elimination: Applications 409

The system of linear equations (7.4.12) is now reduced to

x1 − 2x2 = −2
0x1 − 0x2 = 0
0x1 − 0x2 = 0
0x1 − 0x2 = 0.

The last three equations are trivial; they are satisfied by any arbitrary values of x1 and x2 .
The system (7.4.12) is reduced to the first row since there is only one leading row in A.
So, the solution to the original system (7.4.12) of four equations in two unknowns is now

x1 = −2 + 2x2 .

This equation represents a family of solutions—line on the two dimensional plane (x1 , x2 ).
The variable x2 is referred to as a free variable because it can take any value. The family
of solutions is parameterized by the free variable. Each value of the free variable then de-
termines the value of x1 .

With respect to the structure of the row echelon form as given by (7.3.19), we have that the
block U in (7.3.19) is for the original system matrix and for the augmented matrix given
by:
 
UA = 1 −2 : 1×2
 
UA = 1 −2 −2 : 1 × 3.

In this example, we see that the number of leading rows rA in the row echelon form of A
and the number of leading rows rA in the row echelon form of A are equal, so, the solution
exists, but this number rA is strictly smaller than the number of unknowns rx :

rA < rx = 2. (7.4.16)

This is the condition for the solution to exist but to be non unique.

The number of free variables parameterizing the family of solutions is rx − rA . 

7.4.1.2 General conditions for solution


With linear systems of algebraic equations,

Ax = b, (7.4.17)

like with ordinary differential and difference equations, there are structural questions re-
garding solving the linear system that we would like to address even before we attempt
to solve the linear system. These structural questions are:
410 Gauss and Gauss-Jordan Elimination

Existence of solution: Even before we solve the system, it is important to know if there
is or there is not a solution. If the system has no solution, it is inconsistent. If the
existence question is answered positively, the system is consistent.

Unicity of solution: For a consistent system, the solution may be unique or there may
be more than one solution. If the solution exists, but is not unique, the system is
said to be degenerate. With degenerate systems, when the solution exists but it is
not unique, the Examples showed that there is an uncountable set of solutions–a
family of solutions. We will see in Chapter 10 that the solution set has nice algebraic
properties and nice structure.

When b 6= 0, the generic system (7.4.17) is called an inhomogeneous linear system of


algebraic equations. When b = 0, i.e., the left-hand-side of (7.4.17) is zero,

Ax = 0. (7.4.18)

This is called a homogeneous linear system of algebraic equations, or homogeneous sys-


tem of equations for short. Clearly, the existence of solution question for (7.4.18) is trivial,
since x = 0 is always a solution to (7.4.18). However, unicity of the solution is still a
relevant, non trivial question.

Gauss elimination is a means to answer the existence and unicity structural questions for
generic inhomogeneous systems of algebraic equations (7.4.17) and the question of unic-
ity of solutions for the homogeneous system of equations (7.4.18).

By Gauss elimination the augmented matrix

.
A = [A..b]

and the system matrix A associated with the linear system are reduced to a triangular
form, the row echelon form. The row echelon form has the generic structure given in Sec-
tion 7.3.2.1 by (7.3.19) or by (7.3.20):
 
UA
A= (7.4.19)
0

 
UA
A= . (7.4.20)
0

These structures identify the two upper triangular blocks

UA : rA × n
UA : rA × (n + 1).
7.4 Gauss Elimination: Applications 411

Let rx be the number of variables or unknowns.

The response to the structural questions can now be given in terms of the row dimen-
sions rA and rA of UA and UA , respectively, and of the number of unknowns rx . These
conditions are summarized by:
No solution: rA < rA .
Existence of solution: rA = rA .
Unique solution: rA = rx .
Family of solutions: rA < rx . In this case, the family of solutions is parameterized
by the free variables (determined for example by solving the backward step).
The number of free variables is rx − rA .
We will have occasion to come back to these conditions and cast them in terms of other
relevant parameters of the augmented and system matrices.

7.4.2 LU decomposition of square matrices


If A is square of dimension n, its diagonal can be extended all the way to the entry (n, n)
of A and the echelon form is upper triangular. In alternative to (7.3.20), we can then
represent the row echelon form as an upper triangular matrix:
A =⇒ U. (7.4.21)
The context will clarify when we want to distinguish the echelon form with the block U
of the r leading rows and the n − r bottom rows of zero as in (7.3.20), or simply refer to A
as reduced to an upper triangular form U as in (7.4.21).

We assume now that A is square and the first entry is not zero. We saw in Subsec-
tion 7.3.1.1 that the reduction of a matrix A to the row echelon form is achieved by left
multiplication by elementary matrices. If there are ` such elementary steps then:
E ···E A = U (7.4.22)
| ` {z }1
E=product elementary matrices

EA = U. (7.4.23)
We saw in Subsection 7.3.1.2 that the elementary matrices are invertible, so:
A = E−1 U. (7.4.24)
If there are no row exchanges, i.e., no elementary operations of type O2 , the elementary
matrices Ei , i = 1, · · · , ` are all lower triangular matrices, their product is lower triangular,
and the inverse of their product is lower triangular. Represent this by L, i.e.:
L = E−1 .
412 Gauss and Gauss-Jordan Elimination

Then Gauss elimination gives a decomposition of A as the product of a lower triangular


matrix and an upper triangular matrix:

A = LU. (7.4.25)

Example 7.4.4 I LU factorization


We compute the LU factorization of the matrix a indicated below, getting successively:
      
1 00 100 100 3 55 3 5 5
 0 1 0  0 1 0  − 2 1 0   2 −2 6  =  0 − 16 8 .
3 3 3
0 − 21 1 − 13 0 1 001 1 −1 2 0 0 −1
|  {z  } | {z }
A
 

1 0 0  1 0 0  1 0 0 
 2
 3
 1 0  0 1 0  0 1 0 
0 0 1 13 0 1 0 12 1

The three matrices under the first three elementary matrix factors on the left are the in-
verses (by reverse order) of these elementary matrices. Multiplying these three matrices
and writing them on the RHS, we get the desired LU factorization
    
3 55 100 3 5 5
 2 −2 6  =  2 1 0  0 − 16 8 .
3 3 3
1 1
1 −1 2 3 2
1 0 0 −1


This decomposition is not unique; given square matrix A, there are many possible decom-
positions like (7.4.25) because neither L nor U are unique. However, the decomposition
is made unique by normalizing the diagonal entries of L to being ones. This entails not
normalizing the leading rows of the row echelon form by the pivots as we now explain.

Assume the steps in Gauss elimination do not involve row exchange elementary opera-
tions O2 nor normalization by the inverse of the pivots elementary operations O1 . Then E
is the product of O3 elementary matrices given by (7.3.7), with inverses given by (7.3.18).

Before stating the result, we look at the example of multiplying two such inverses:
    
100 100 100
 −2 1 0  0 1 0  =  −2 1 0 .
001 −3 0 1 −3 0 1

This is very intuitive. First recall that the LHS of this equality is the product of the inverses
of two O3 elementary operations, which means the original O3 operations are applied in
reverse order. The first matrix factor in the LHS is the inverse of the O3 operation where
we had subtracted from row two the first row multiplied by 2; this Gauss elimination step
7.4 Gauss Elimination: Applications 413

zeros out the entry (2, 1) of the original matrix. Likewise, the second matrix factor is the
inverse of the O3 operation where we had subtracted from row three the first row multi-
plied by 3; this Gauss elimination step zeros out the entry (3, 1) of the original matrix.

Now, looking at the result of the product of these two inverse matrices, the matrix on the
RHS, we see that, besides the diagonals that are ones, the nonzero entries are the entries
(2, 1) and (3, 1) that save the negative of the scalars (−2 and −3) used to zero out these
entries (2, 1) and (3, 1) in the original matrix.

We now see this more generally. Consider two successive O3 elementary operations
(I + αi2 j2 Ei2 j2 )(I + αi1 j1 Ei1 j1 ).
We assume that i1 6= i2 6= j1 6= j2 . The inverse of this product is:
[(I + αi2 j2 Ei2 j2 )(I + αi1 j1 Ei1 j1 )]−1 = (I − αi1 j1 Ei1 j1 )(I − αi2 j2 Ei2 j2 )
= I − αi1 j1 Ei1 j1 − αi2 j2 Ei2 j2
since for i1 6= i2 6= j1 6= j2
Ei1 j1 Ei2 j2 = 0.
In summary, when only O3 elementary operations are used to reduce a square matrix A
to row echelon form in Gauss elimination (no row exchange nor normalization by the
pivots), the structure of the lower triangular matrix in the LU decomposition (7.4.25) is
readily available:
X
L=I− αij Eij . (7.4.26)
i,j,i>j

The scalars αij are the scalars used in the O3 elementary operations used in Gauss elim-
ination to zero out the entry (i, j) below the diagonal of the square matrix A. If a given
entry (i, j) below the diagonal of the square matrix A was zero to start with, then the
corresponding αij = 0.

The diagonal entries of L in (7.4.26) are all ones. With this normalization, the LU decom-
position1 in (7.4.25) of matrix A is unique. It is referred to as the LU decomposition of the
square matrix A.

We have shown this decomposition to hold for square matrices, where the row echelon
form is obtained by Gauss elimination with no O1 nor O2 elementary operations. The
result holds more generally for arbitrary rectangular matrices by allowing for a permuta-
tion matrix. We will not comment further on this issue.
1
In the expression “LU decomposition,” LU is an adjective to the noun decomposition, and so it is not
boldfaced. In other words, it does not stand for the matrix factors L and U, rather, it stands for the first
letters of “Lower” and “Upper” triangular factors.
414 Gauss and Gauss-Jordan Elimination

Remark 7.4.1 (Mnemonic on LU decomposition). With the insight provided on the ma-
trix L, Gauss elimination can be slightly modified to easily obtain the entries of L.

We illustrate this with the first three steps in Example 7.4.2 where we use Gauss elimination
to reduce to row echelon form the augmented matrix A of the linear system in the example.
We reproduce these three steps here for easy recall:
.. .. ..
     
1 −2 . −2 1 −2 . −2 1 −2 . −2
 ..   ..   . 
 5 −2 . 4   0 8 . 14   0 8 .. 14 
     
A=  =⇒   =⇒ 
 3 2 ... 8   |{z}  3 2 ... 8   |{z}  0 8 ... 14 

  1 0 0 0   1 0 0 0  
..  −5 1 0  ..  0 1 0  ..
2 −4 . −3 0 0 1 0
0 0 0 1
2 −4 . −3 −3 0 1 0
0 0 0 1
2 −4 . −3
..
 
1 −2 . −2
 .. 
0 8 . 14
 
=⇒
 
 .
.

.
|{z}  0

1 0 0 0 
 8 14 

 0 1 0  ..
0 0 1 0
−2 0 0 1
0 0. 1

These three steps only use elementary operation O3 , and they zero out the three entries in the
first column below entry (1, 1).

The three factors used can be called out from the elementary matrices below the arrows: −5,
−3, and −2. The first column of L is then [1 5 3 2]T . The mnemonic is to register these
entries of L, as Gauss elimination progresses, by replacing by these scalars in parenthesis the
zeroed out entries. For this example:
.. .. ..
     
1 −2 . −2 1 −2 . −2 1 −2 . −2
 ..   ..   . 
 5 −2 . 4   (5) 8 . 14   (5) 8 .. 14 
     
A=  =⇒   =⇒ 
 3 2 ... 8   3 2 ... 8   (3) 8 ... 14 

     
.. .. ..
2 −4 . −3 2 −4 . −3 2 −4 . −3
..
 
1 −2 . −2
 .. 
 (5) 8 . 14 
 
=⇒ 
 (3) 8 ... 14 

 
..
(2) 0 . 1


7.4 Gauss Elimination: Applications 415

7.4.3 Determinant of square matrices


Assume that the LU decomposition of a square matrix A is known:
A = LU,
where L has been normalized to unit diagonal entries. Then, the determinant of A is
|A| = |U|.
Because A is square, U is also square. Since U is upper triangular, its determinant is the
product of its diagonal entries. Hence:
|A| = Πni=1 Uii .
If any Uii is zero, then |A| = 0. But if this is the case, we know that there will be zero
rows at the bottom of the row echelon form of A. This is the case because: 1) there are n
diagonal entries of U (since U is square); 2) if a diagonal entry is zero, say U(n−1)(n−1) , and
this row, say row n − 1, is not zero, the leading entry is to the right of this diagonal entry,
say entry U(n−1)n ; and, finally, 3) row n cannot be a leading row, since its leading entry
would be to the right of entry (n − 1, n) and this is not possible.

This says that if the determinant of the square matrix A is nonzero, |A| 6= 0, the diag-
onal entries of U are all nonzero, so they are the leading entries of the row echelon form
of A, and so they are the pivots used in the Gauss elimination of A.

So, if A is non singular we have:


|A| = Πni=1 Uii .
|{z}
pivots

7.4.4 Inverse of square matrices


Let A be square of dimension n and be invertible. We know from Result 6.6.2 in Chapter 6
that the square matrix A is invertible if and only if its determinant is nonzero, i.e.,
A−1 exists ⇔ |A| =
6 0.
We now find the inverse of this square matrix A, which is assumed to be invertible, by
Gauss-Jordan elimination. We illustrate the ideas first with an example.

Consider the augmented matrix in (7.2.22), herein repeated,


.
 
1 0 − 31 .. 103 
 0 −8 8 ... − 20 .

 3 3 
31 .. 22
0 6 3 .−3
416 Gauss and Gauss-Jordan Elimination

We work only with the matrix of the first three columns

1 0 − 13
 

A =  0 −8 83 . (7.4.27)
0 6 31
3

By (7.2.31) and (7.2.32), this matrix A is reduced to the identity matrix, i.e.,

1 0 13
   
1 00 10 0
 0 1 0  0 1 31  0 0 0 
3
001 0 01 0 0 37
| {z }
EGJ

1 0 − 13
    
1 00 1 0 0 100
 0 1 0  0 − 1 0  0 0 1  0 6 31 
8 3
8
0 −6 1 0 0 1 010 0 −8 3
| {z }
EG
 
100
=  0 1 0 .
001

In other words,

EGJ EG A = I.

Since the inverse of a square matrix when it exists is unique, see Result 6.6.1 in Chapter 6,
we have
−1
1 0 − 13

 0 6 31  = EGJ EG
3
0 −8 83
1 0 31
      
100 10 0 1 00 1 00 100
=  0 1 0  0 1 13  0 0 0  0 1 0  0 − 81 0  0 0 1 .
3
001 001 0 0 37 0 −6 1 0 01 010

Since each factor on the right hand side has been computed as we apply Gauss-Jordan
elimination, this procedure provides a method to compute the inverse of the invertible
matrix A.

Rather than performing the multiplication of the factors EGJ EG , we can compute the
inverse of the matrix by performing Gauss-Jordan elimination on the matrix

[A|I],
7.5 Problems 417

i.e., on the matrix that is the concatenation of the matrix A, whose inverse we want to
compute and is assumed to exist, and the identity matrix I. We can readily conclude that

EGJ EG [A|I] = [EGJ EG A|EGJ EG ].

The right block is the inverse of A.

If the matrix A is not invertible, then Gauss-Jordan elimination would lead in some step
to a row of zeros; this is an indicator that the matrix is not invertible. You can conclude
this because

|EGJ EG A| = |EGJ EG ||A|


= 0,

since the matrix on the left has a row of zeros, so its determinant is zero, and the determi-
nant of |EGJ EG | =
6 0 since it is the product of invertible matrices.

7.5 Problems
1. Consider the matrix A
 
1 −1 1 2
A = 3 1 1 0
2 −1 1 −1

(a) Write the linear system of algebraic equations for which A is the augmented
matrix. Identify explicitly the system matrix A, the vector of dependent vari-
ables x, and the vector of independent terms b.
(b) Apply Gauss elimination to A, representing each step by a corresponding ele-
mentary matrix Ei .
(c) Determine the row echelon forms of A and A.
(d) Determine if the solution of the system exists, and if it exists if it is unique or
not.
(e) If S is the set of solutions for the linear system in Part 1a, determine its cardi-
nality.
(f) If the cardinality of S is one, determine the solution of the linear system in
Part 1a; if the cardinality of S is greater than one, determine the family of solu-
tions of the linear system in Part 1a, i.e., the generic xs ∈ S.
(g) Determine the LU -factorization of A.
Hint:Although in the Lecture we considered the LU-factorization of square ma-
trices, this factorization is valid for arbitrary matrices and can be determined
by Gauss elimination as seen by this Example.
418 Gauss and Gauss-Jordan Elimination

(h) Determine |A|.


(i) Consider the system matrix A for this problem. If you can, find its inverse by
Gauss-Jordan elimination; if you cannot, say why you cannot.
2. Except for Part 1i, repeat problem 1 with the matrix A
 
1 −1 1 −1 2
2 2 1 1 1
A= 3 1 2 0 3

5 33 14

3. Repeat problem 1 with the matrix A


 
2 −4 −4 2
A = 2 1 1 3
2 1 14
Let the system matrix for this problem be A. If you can, find its inverse by Gauss-
Jordan elimination; if you cannot, say why you cannot.
4. Consider the system of linear algebraic equations:
x1 − x2 + x3 = 2
3x1 + x2 + x3 = 0 (7.5.1)
2x1 − 2x2 + x3 = −1.

(a) Write the augmented matrix A of the linear system (7.5.1) and identify the sys-
tem matrix, the vector of dependent variables, and the vector of independent
terms.
(b) Reduce the augmented matrix to its row echelon form. Identify the row echelon
form of the system matrix.
(c) Determine from your answer to Part 4b if the linear system (7.5.1) has a solu-
tion, a unique solution, or a family of solutions. Determine the unique solution
or the family of solutions if either exists. Justify your answer.
(d) Perform back substitution on the row echelon form of the augmented matrix
determined in Part 4b.
(e) Perform Gauss-Jordan on the row echelon form determined in Part 4b.
(f) Adjoin to (7.5.1) a fourth equation
x1 + 2x2 + 3x3 = −1. (7.5.2)

Let A
e and A e be the system and augmented matrices of the new system ob-
tained by augmenting the system (7.5.1) with this equation (7.5.2). Repeat
Parts 4b and 4c.
7.5 Problems 419

(g) Represent the elementary operations in Part 4f by elementary matrices.


(h) Determine the LU -factorization for the square matrix A in Part 4f.
(i) Determine the determinant of the square matrix A in Part 4f.
(j) If you can, invert the square matrix A in Part 4f. If you cannot, say why you
cannot.
5. Consider the matrix:
 
7 2 2 1
8 24 9 30 
 
 78
A= 1 9 2 (7.5.3)

3
2 8 5
7 1 7 7

(a) Perform Gauss elimination on A to get its row echelon form.


(b) Perform Gauss-Jordan on the row echelon form determined in Part 5a.
(c) Write the elementary matrices representing each of the elementary operations
used in Part 5a.
(d) Write the linear system for which the matrix A is the augmented matrix. What
are the free variables if any in this system. Justify your answer.
6. Consider the matrix:
 
1 −1 1 −1 2
2 2 1 1 1
A=
3 1 2 0 3
 (7.5.4)
5 33 14

(a) Perform Gauss elimination on A to get its row echelon form.


(b) Perform Gauss-Jordan on the row echelon form determined in Part 6a.
(c) Write the elementary matrices representing each of the elementary operations
used in Part 6a.
(d) Write the linear system for which the matrix A is the augmented matrix. Deter-
mine from your answer to Part 6a if this linear system has a solution, a unique
solution, or a family of solutions. Determine the unique solution or the family
of solutions if either exists. What are the free variables if any in this system.
Justify your answer.
7. Consider the system of linear algebraic equations:

x1 −2x2 −2x3 = 1
2x1 + x2 + x3 = 3 (7.5.5)
2x1 + x2 +2x3 = 3
420 Gauss and Gauss-Jordan Elimination

(a) Determine if there exists a solution to (7.5.5).


(b) If there exists a solution say if it is unique or a family of solutions, and deter-
mine this or these solutions. If there is no solution, explain why not.

8. Consider the system of linear algebraic equations with real valued coefficients:

x1 + x2 +λx3 +x4 = 1
x1 +λx2 + x3 −x4 = 2 (7.5.6)
λx1 + x2 + x3 =3

The parameter λ 6= 1, −2.

The row echelon form of the augmented matrix is


 .. 
11 λ 1 . 1
 .. 
 0 1 −1 − 2 . 1  (7.5.7)
 λ−1 λ−1 
1 .. 3λ−4
0 0 1 λ−1 . λ2 +λ−2

(a) Determine, if any, value or values of λ so that (7.5.6) has a single solution.
Justify your answer.
(b) Determine, if any, value or values of λ so that (7.5.6) has no solution. Justify
your answer.
(c) Determine, if any, value or values of λ so that (7.5.6) has a family of solutions.
Justify your answer.
References

[1] E. Kryszig, Advanced Engineering Mathematics, 10th ed. New York, NY: John Wiley
& Sons, August 2011.

[2] D. G. Zill and W. S. Wright, Advanced Engineering Mathematics, 4th ed. MA: James
and Bartlett Publisher, 2010.

[3] G. Strang, Introduction to Linear Algebra, 4th ed. Wellesley, MA: Wellesley-
Cambridge Press, 2009.

[4] P. R. Halmos, Naive Set Theory, ser. The University Series in Undergraduate Mathe-
matics. New York, New York: Van Nostrand Reinhold Company, 1960.

[5] K. Hrbaceck and T. Jech, Introduction to Set Theory, ser. Pure and Applied Mathemat-
ics. A Series of Monographs and Textbooks. New York, New York: Marcel Dekker,
Inc., 1984.

[6] R. V. Churchill, Complex Variables and Applications, Second Edition Hardcover. New
York, NY: McGraw-Hill, 1960, original copyright 1940.

[7] H. Haber. (2011) The complex logarithm, exponential and power functions. [Online].
Available: http://scipp.ucsc.edu/∼haber/ph116A/clog 11.pdf

[8] ——. (2012, September) The complex inverse trigonometric and hyperbolic
functions. [Online]. Available: http://scipp.ucsc.edu/∼haber/webpage/arc.pdf

[9] K. Hoffman and R. Kunze, Linear Algebra, ser. Prentice-Hall Mathematics Series. En-
glewood Cliffs, NJ: Prentice-Hall, Inc, 1961.

[10] H. Anton, Elementary Linear Algebra 2e . New York: John Wiley & Sons, Inc, 1973.

[11] I. S. Gradshteyn and I. M. Ryzhik, Table of integrals, Series, and Products. San Diego,
CA: Academic Press, Inc, 1965, 1980.

Anda mungkin juga menyukai