cs530 09 Notes

Advanced algorithms
Freely using the textbook by Cormen, Leiserson, Rivest, Stein
Péter Gács
Computer Science Department

Boston University
Spring 09
Péter Gács (Boston University) CS 530 Spring 09 1 / 165

Introduction
The class structure
See the course homepage.

In the notes, section numbers and titles generally refer to the
book: CLSR: Algorithms, second edition.

Linear algebra Matrices and vectors
Vectors
For us, a vector is always given by a finite sequence of numbers.

Row vectors, column vectors, matrices.
Notation:
Z: integers,
Q: rationals,
R: reals,
C: complex numbers.
Q, R, C are fields (allowing division as well as multiplication).
(We may get to see also some other fields later.)
Addition: componentwise. Over a field, multiplication of a vector
by a field element is also defined (componentwise).
Linear combination.

Linear algebra Vector spaces
Vector spaces
Vector space over a field: a set M of vectors closed under linear

combination.
Elements of the field will be also called scalars.
Examples
The set C of complex numbers is a vector space over the field
R of real numbers (2 dimensional, see later).
It is also a vector space over the complex numbers (1
dimensional).
{ (x, y, z) : x + y + z = 0 }.
{ (2t + u, u, t − u) : t, u ∈ R }.

Linear algebra Linear dependence
Linear dependence
Subspace. Generated subspace.

Two equivalent criteria of dependence:
one of them depends on the others (is in the subspace
generated by the others)
a nontrivial linear combination is 0.
Examples
{(1, 2), (3, 6)}. Two vectors are dependent when one is a scalar
multiple of the other.
{(1, 0, 1), (0, 1, 0), (1, 1, 1)}.
Basis in a subspace M: a maximal lin. indep. set.
Theorem
A set is a basis iff it is a minimal generating set.

Examples
A basis of { (x, y, z) : x + y + z = 0 } is {(0, 1, −1), (1, 0, −1)}.
A basis of { (2t + u, u, t − u) : t, y ∈ R } is {(2, 0, 1), (1, 1, −1)}.
Theorem
All bases have the same number of elements.
Proof. Via the exchange lemma.
Dimension of a vector space: this number.
Example
The set of all n-tuples of real numbers with the property that the
sum of their elements is 0 has dimension n − 1.

Let M be a vector space. If b i is an n-element basis, then each

vector x in M in has a unique expression as
x = x1 b1 + · · · + xn b n .
The x i are called the coordinates of x with respect to this basis.
Example
If M is the set Rn of all n-tuples of real numbers then the
n-tuples of form e i = (0, . . . , 1, . . . , 0) (only position i has 1) form a
basis. Then (x1 , . . . , xn ) = x1 e1 + · · · + xn e n .

Example
If A is the set of all n-tuples whose sum is 0 then the n − 1 vectors
(1, −1, 0, ..., 0)

(0, 1, −1, 0, . . . , 0)
...
(0, 0, 0, 0, . . . , 0, 1, −1)
form a basis of A (prove it!).

Linear algebra Matrices
Matrices
(a i j ). Dimensions. m × n
Diagonal matrix diag(a 11 , . . . , a nn )
Identity matrix.
Triangular (unit triangular) matrices.
Permutation matrix.
Transpose A T . Symmetric matrix.

Matrix representing a linear map
A p × q matrix A can represent a linear map R q → R p as follows:
x1 = a 11 y1 + · · · + a 1 q yq
.. ..
. .
x p = a p1 y1 + · · · + a pq yq
With column vectors x = (x i ), y = (y j ) and matrix A = (a i j ), this

can be written as
x = A y.
This is taken as the definition of matrix-vector product.

General definition of a linear transformation F : V → W. Every
such transformation can be represented by a matrix, after we fix
bases in V and W.

Matrix multiplication
Let us also have
y1 = b 11 z1 + · · · + b 1r z r
.. ..
. .
yq = b q1 z1 + · · · + b qr z r
writeable as y = Bz. Then it can be computed that
x = Cz where C = (c ik ),
c ik = a i1 b 1k + · · · + a iq b qk (i = 1, . . . , p, k = 1, . . . , r).

We define the matrix product
AB = C
from above, which makes sense only for compatible matrices

(p × q and q × r). Then
x = A y = A(Bz) = Cz = (AB)z.
From this we can infer also that matrix multiplication is

associative.
Example
¡0 1¢ ¡0 0¢
For A = 00 , B= 10 we have AB 6= BA.

Transpose of product
Easy to check: (AB)T = B T A T .
Inner product
If a = (a i ), b = (b i ) are vectors of the same dimension n taken as
column vectors then
aT b = a1 b1 + · · · + a n b n
is called their inner product: it is a scalar. The Euclidean norm

(length) of a vector v is defined as
p
vT v = ( v2i )1/2 .
X
i

The (less frequently used) outer product makes sense for any two
column vectors of dimensions p, q, and is the p × q matrix
ab T = (a i b j ).

Linear algebra Inverse, rank
Inverse, rank
Example
µ ¶−1 µ ¶
1 1 0 1
= .
1 0 1 −1
(AB)−1 = B−1 A −1 .
(A T )−1 = (A −1 )T .
A square matrix with no inverse is called singular. Nonsingular
matrices are also called regular.
Example
¡1 0¢
The matrix 10 is singular.

Im(A) = set of image vectors of A. If the colums of matrix A are

a1 , . . . , a n , then the product Ax can also be written as
Ax = x1 a1 + · · · + xn a n .
This shows that Im(A) is generated by the column vectors of the

matrix, moreover
   
1 0
0 1
   
0 0
   
a j = Ae j , with e1 =   , e2 = 
 
  , and so on.
 ..   .. 
. .
0 0

Ker(A) = the set of vectors x with Ax = 0.

The sets Im(A) and Ker(A) are subspaces.
Null vector of a matrix: non-0 element of the kernel.
Theorem
If A : V → W then
dim Ker(A) + dim Im(A) = dim(V ).
Theorem
A square matrix A is singular iff KerA 6= {0}.
More generally, a non-square matrix A will be called singular, if

KerA 6= {0}.
The rank of a set of vectors: the dimension of the space they
generate.
The column rank of a matrix A is dim(ImA). (The row rank is
harder to interpret.)
Theorem
The two ranks are the same (see proof later). Also, rank(A) is the
smallest r such that there is an m × r matrix B and an r × n
matrix C with A = BC.
A special case is easy:
Proposition
A triangular matrix with only r rows (or only r columns) and all
non-0 diagonal elements in those rows, has row rank and column
rank r.
Interpretation: going through spaces with dimensions m → r → n.
Example
The outer product A = bc T of two vectors has rank 1, and this
product is the decomposition.

The following is immediate:
Proposition
A square matrix is nonsingular iff it has full rank.
Minors.

Linear algebra Determinant
Determinant
Definition
A permutation: an invertible map σ : {1, . . . , n} → {1, . . . , n}.
The product of two permutations σ, τ is their consecutive
application: (στ)(x) = σ(τ(x)).
A transposition is a permutation that interchanges just two
elements.
An inversion in a permutation: a pair of numbers i < j with
σ(i) > σ( j). We denote by Inv(σ) the number of inversions in σ.
A permutation σ is even or odd depending on whether Inv(σ)
is even or odd.

Proposition
a A transposition is always an odd permutation.
b Inv(στ) ≡ Inv(σ) + Inv(τ) (mod 2).
It follows from these that multiplying a permutation with a

transposition always changes its parity.

Definition
Let A = (a i j ) an n × n matrix. Then
(−1)Inv(σ) a 1σ(1) a 2σ(2) · · · a nσ(n) .

X
det(A) = (1)
σ
Geometrical interpretation the absolute value of the determinant

of a matrix A over R with column vectors a1 , . . . , a n is the
volume of the parallelepiped spanned by these vectors in
n-space.
Recursive formula Let A i j be the submatrix (minor) obtained by
deleting the ith row and jth column. Then
(−1) i+ j a i j det(A i j ).
X
det(A) =
j
Computing det(A) using this formula is just as inefficient as

using the original definition (1).

Properties
det A = det(A T ).
det(v1 , v2 , . . . , v n ) is multilinear, that is linear in each
argument separately. For example, in the first argument:
det(α u + βv, v2 , . . . , v n ) = α det(u, v2 , . . . , v n ) + β det(v, v2 , . . . , v n ).
Hence det(0, v2 , . . . , v n ) = 0.
Antisymmetric: changes sign at the swapping of any two
arguments. For example for the first two arguments:
det(v2 , v1 , . . . , v n ) = − det(v1 , v2 , . . . , v n ).
Hence det(u, u, v2 , . . . , v n ) = 0.

It follows that any multiple of one row (or column) can be added to
another without changing the determinant. From this it follows:
Theorem
A square matrix is singular iff its determinant is 0.
The following is also known.
Theorem
det(AB) = det(A) det(B).

Linear algebra Positive definite matrices
Positive definite matrices
An n × n matrix A = (a i j ) is symmetric if a i j = a ji (that is,

A = A T ). To each symmetric matrix, we associate a function
Rn → R called a quadratic form and defined by
x 7→ xT Ax =
X
a i j xi x j .
ij
The matrix A is positive definite if xT Ax Ê 0 for all x and

equality holds only with x = 0.

Linear algebra Positive definite matrices
For example, if B is a nonsingular matrix then A = B T B is always

positive definite. Indeed,
xT B T Bx = (Bx)T (Bx),
the squared length of the vector Bx, and since B is nonsingular,

this is 0 only if x is 0.
Theorem
A is positive definite iff A = B T B for some nonsingular B.

Divide and conquer Polynomial multiplication
Divide and conquer

Polynomial multiplication
We will illustrate here the algebraic divide-and-conquer method.

The problem is similar for integers, but is slightly simpler for
polynomials.
nX
−1
f = f (x) = a i xi ,
i =0
nX −1
g = f (x) = b i xi ,
i =0
2Xn−2
f (x)g(x) = h(x) = c k xk ,
k=0
where c k = a 0 b k + a 1 b k−1 + · · · + a k b 0 .

Let M(n) be the minimal number of multiplications of constants

needed to compute the product of two polynomials of length n.
The school method shows
M(n) É n2 .
Can we do better?

Divide and conquer
For simplicity, assume n is a power of 2 (otherwise, we pick n0 > n

that is a power of 2). Let m = n/2, then
f (x) = a 0 + · · · + a m−1 x m−1 + x m (a m + · · · + a 2m−1 x m−1 )

= f 0 (x) + x m f 1 (x).
Similarly for g(x). So,
f g = f 0 g 0 + x m ( f 0 g 1 + f 1 g 0 ) + x2 m f 1 g 1 .
In order to compute f g, we need to compute
f0 g0, f0 g1 + f1 g0, f1 g1.

How many multiplications does this need? If we compute f i g j

separately for i, j = 0, 1 this would just give the recursion
M(2m) É 4M(m)
which suggests that we really need n2 multiplications.

Trick that saves us a (polynomial) multiplication:
f 0 g 1 + f 1 g 0 = ( f 0 + f 1 )(g 0 + g 1 ) − f 0 f 1 − g 0 g 1 . (2)
We found M(2m) É 3M(m). This trick saves us a lot more when

we apply it recursively.
M(2k ) É 3k M(1) = 3k .
So, if n = 2k , then k = log n,
M(n) < 3log n = 2log n·log 3 = nlog 3 .
log 4 = 2, so log 3 < 2, so nlog 3 is a smaller power of n than n2 .

(It is actually possible to do much better than this.)

Counting also additions

Let L(n) be the complexity of multiplication when additions of
constants are also counted. The addition of two polynomials of
length n takes at most n additions of constants. Taking this into
account, the above trick gives the following new estimate:
L(2m) É 3L(m) + 10m.
Let us show from here, by induction, that L(n) = O(nlog 3 ).
L(2m) É 3L(m) + 10m,

L(4m) É 9L(m) + 10m(2 + 3),
L(8m) É 27L(m) + 10m(22 + 2 · 3 + 32 ),
L(2k ) É 3k L(1) + 10(2k−1 + 2k−2 · 3 + · · · + 3k−1 )
< 3k + 10 · 3k−1 (1 + 2/3 + (2/3)2 + · · · ).

As we see, counting also the additions did not change the

upper bound substantially. The reason is that even when
counting only multiplications, we already had to deal with the
most important issue: the number of recursive calls when
doing divide-and-conquer.
The best-known algorithm for multiplying polynomials or
integers requires of the order of n log n log log n operations.
(Surprisingly, it uses a kind of “Fourier transform”.)

Divide and conquer Faster matrix multiplication
Faster matrix multiplication
For matrix multiplication, there is a trick similar to the one seen

for polynomial multiplication. Let
µ ¶ µ ¶
a b e f
A= , B= ,
c d g h
µ ¶
r s
C = AB = .
t u
Then r = ae + b g, s = a f + bh, t = ce + d g, u = c f + dh. The naive

way to compute these requires 8 multiplications. We will find a
way to compute them using only 7.

Let
P1 = a( f − h),
P2 = (a + b)h,
P3 = (c + d)e,
P4 = d(g − e),
P5 = (a + d)(e + h),
P6 = (b − d)(g + h),
P7 = (a − c)(e + f ).
Then
r = −P2 + P4 + P5 + P6 ,
s = P1 + P2 ,
(3)
t = P3 + P4 ,
u = P1 − P3 + P5 − P7 .

In all products P i , the elements of A are on the left, and the

elements of B on the right. Therefore the calculations leading
to (3) do not use commutativity, so they are also valid when
a, b, · · · , g, h are matrices. If M(n) is the number of multiplications
needed to multiply n × n matrices, then this leads (for n a power
of 2) to
M(n) É nlog 7 .
Taking also additions into account:
T(2n) É 7T(n) + O(n2 ).
Read Section 4 of CLRS to recall how to prove from here

T(n) = O(nlog 7 ).

The currently best known matrix multiplication algorithm

has an exponent substantially lower than log 7, but still
greater than 2.
There is a great difference between the applicability of fast
polynomial multiplication and fast matrix multiplication.
The former is practical and is used much, in computing products
of large polynomials and numbers (for example in
cryptography).
On the other hand, fast matrix multiplication is an (important)
theoretical result, but with serious obstacles to its practical
application. First, there are problems with its numerical
stability, due to all the subtractions, whose effect may magnify
round-off errors. Second, and more importantly, large matrices
in practice are frequently sparse, with much fewer than n2
elements. Strassen’s algorithm does not exploit this.

Linear equations Elimination
Linear equations
Informal treatment first
a 11 x1 + · · · + a 1n xn = b 1 ,
.. ..
. .
a m1 x1 + · · · + a mn xn = b m .
How many solutions? Undetermined and overdetermined

systems.

For simplicity, let us count just multiplications again.

Jordan elimination: eliminating first x1 , then x2 , and so on.
n · n · (n + (n − 1) + · · · ) ≈ n3 /2.
Gauss elimination: eliminating xk only from equations

k + 1, k + 2, . . . . Then solving a triangular set of equations.
Elimination:
n(n − 1) + (n − 1)(n − 2) + · · · ≈ n3 /3.
Triangular set of equations:
1 + 2 + · · · + (n − 1) ≈ n2 /2.

Sparsity and fill-in
Example (Chvatal)
A sparse system that fills in.
x1 + x2 + x3 + x4 + x5 + x6 = 4,
x1 + 6x2 = 5,
x1 + 6x3 = 5,
x1 + 6x4 = 5,
x1 + 6x5 = 5,
x1 + 6x6 = 5.
Eliminating x1 fills in everything. There are some guidelines that

direct us to eliminate x2 first, which leads to no such fill-in.

Outcomes of Gaussian elimination
(Possibly changing the order of equations and variables.)

Contradiction: no solution.
Triangular system with nonzero diagonal: 1 solution.
Triangular system with k lines: the solution contains n − k
parameters xk+1 , . . . , xn .
a 11 x1 + ··· + a 1,k+1 xk+1 + · · · + a 1n xn = b 1 ,

a 22 x2 + · · · + a 2,k+1 xk+1 + · · · + a 2n xn = b 2 ,
.. ..
. .
a kk xk + · · · + a k,k+1 xk+1 + · · · + a kn xn = b k ,
where a 11 , . . . , a kk 6= 0. Then dim Ker(A) = n − k, dim Im(A) = k.

The operations performed do not change row and colum rank,
so we find (row rank) = (column rank) = k.

Duality
The original system has no solution if and only if a certain other

system has solution. This other system is the one we obtain
trying to form a contradiction 0 = 1 from the original one, via a
linear combination with coefficients y1 , . . . , ym :
a 11 y1 + · · · + a m1 ym = 0,
.. ..
. .
a 1n y1 + · · · + a mn ym = 0,
b 1 y1 + · · · + b m ym = 1.
Gives an easy way to prove that the system is unsolvable.

Linear equations LUP decomposition
LUP decomposition
Permutation matrix. P A interchanges the rows, AP the columns.
Example
The following matrix represents the permutation (2, 3, 1) since its
rows are obtained by this permutation from the unit matrix:
0 0 1
 
1 0 0
0 1 0

LUP decomposition of matrix A:
P A = LU
Using for equation solution:
P b = P Ax = LU x.
From here, forward and back substitution.

Computing the LU decomposition

Zeroing out one column
The following operation adds λ i times row 2 to rows 3, 4, . . . of A:

 
1 0 0 0 0 ... 0
0 1 0 0 0 . . . 0
 
0 λ3 1 0 0 . . . 0 A.
 
L2 A = 
0 λ4 0 1 0 . . . 0
 
.. .. .. .. .. . . ..
 
. . . . . . .
 
1 0 0 0 0 ... 0
0 1 0 0 0 . . . 0
 
−1 0 − λ 1 0 0 . . . 0
 
L2 =   3 .
0 −λ4 0 1 0 . . . 0

.. .. .. .. .. . . .
 
. . . . . . ..
Similarly, a matrix L1 might add multiples of row 1 to rows

2, 3, . . . .
Repeating:
1 −1
B3 = L−
2 L1 A,
 
0 a 11 a 12 a 13 ...

1 0 0 0 ...
λ 0 a(1) a(1) . . .
 
 2 1 0 0 ... 0  22 23
a(2)
 
λ µ3 1 0 ... 0  0 0 . . .

A = L1 L2 B3 = 
 3

33 
λ4 µ4

0 1 ... 0  0 0 a(2) . . .


.. .. .. .. ..  . 43
.. .. .. ..

. . . . . . .. .
. .

Example: If
wT
µ ¶
a 11
A=
v A0
then setting
µ ¶ µ ¶
1 0 1 1 0
L1 = , L−
1 = ,
v/a 11 I n−1 −v/a 11 I n−1
we have L−1 A = B2 , A = L1 B2 where
wT
µ ¶
a 11
B2 = .
0 A 0 − vwT /a 11
The matrix A 2 = A 0 − vwT /a 11 is the Schur’s complement of A.

If A 2 is singular then so is A (look at row rank).

Positive definite matrix

µ T¶
a 11
v
If A is symmetric: A = then with U 1 = L1T we have
A0
v
µ ¶
−1 −1 a 11 0
L1 AU 1 =
0 A2
with Schur’s complement A 2 = A 0 − vvT /a 11 .
Proposition
If A is positive definite then A 2 is also.
Proof. We have yT A 2 y = xT Ax, with

µ ¶
1 0
x = U−
1 y =: M 1 y.
I n−1
If y shows A 2 not positive definite by yT A 2 y É 0 then x = M 1 y

shows A not positive definite.
Passing through a permutation
Suppose that having A = L1 L2 B3 = LB3 , we want to permute the

rows 3, 4, . . . using a permutation π before applying some L3−1 to
L−1 A (say because position (3, 3) in this matrix is 0). Let P be the
permutation matrix belonging to π:
PL−1 A = L3 B4 ,
P A = PLP −1 L3 B4 = L̂L3 B4 where
 
1 0 0 0 ... 0
 λ 1 0 0 ... 0
 2 
−1 λ µπ(3) 1 0 . . . 0 ,
 
L̂ = PLP =   π(3)
λπ(4) µπ(4) 0 1 . . .

0
.. .. .. .. . . ..
 
. . . . . .
assuming L1 was formed with λ2 , λ3 , . . . , and L2 with µ3 , µ4 , . . . .

Organizing the computation: In the kth step, we have a

representation
P A = LB k+1 ,
where the first k columns of B k+1 are 0 below the diagonal.

During the computation, only one permutation π needs to be
maintained, in an array.
Pivoting (see later).
Positive definite matrices do not require it (see later).
Putting it all in a single matrix: Figure 28.1 of CLRS.

LUP decomposition, in a single matrix
for i = 1 to n do π[i] ← i
for k = 1 to n do
p←0
for i = k to n do
if |a ik | > p then
p ← |a ik |
k0 ← i
if p = 0 then error “singular matrix”
exchange π[k] ↔ π[k0 ]
for i = 1 to n do exchange a ki ↔ a k0 i
for i = k + 1 to n do
a ik ← a ik /a kk
for j = k + 1 to n do a i j ← a i j − a ik a k j

What if there is no pivot in some column?

General form:
P AQ = LU, P A = LUQ −1 .
Using for equation solution:
P b = P Ax = LUQ −1 x.
Find P b by permutation via P, then Q −1 x by forward and

backward substitution, then x by permutation via Q.

Proposition
For an n × n matrix A, the row rank is the same as the column
rank.
Proof. Let P AQ = LU. If U has only r rows then L needs to

have only r columns, and vice versa, so L: n × r and U: r × n.
Let us see that r is the row rank of A. Indeed, A has a column
rank r since U maps onto Rr and the image of L is also
r-dimensional. By transposition, the same is true for A T = U T L T ,
and hence the row rank is the same as the colum rank.

Issues of rounding Determinant exactly
Computing the determinant exactly
Computing the determinant of an integer matrix is a task that

can stand for many similar ones, like the LU decomposition,
inversion or equation solution. The following considerations
apply to all.
How large is the determinant? Interpretation as volume: if
matrix A has rows a1T , . . . , aT
n then
n Xn
( a2i j )1/2 .
Y
det A É |a1 | · · · |a n | =
i =1 j =1
This is known as Hadamard’s inequality.

Working with exact fractions
A single addition or subtraction may double the number of

digits needed, even if the size of the numbers does not grow.
a c ad + bc
+ = .
b d bd
If we are lucky, we can simplify the fraction.
It turns out that with Gaussian elimination, we will be lucky
enough.

Theorem
Assume that Gaussian elimination on an integer matrix A
succeeds without pivoting. Every intermediate term in the
Gaussian elimination is a fraction whose numerator and
denominator are some subdeterminants of the original matrix.
(By the Hadamard inequality, these are not too large.)

More precisely, let
A (k) = be the matrix after k stages of the elimination.
D (k) = the minor determined by the first k rows and columns
of A.
D (ikj ) =, for k + 1 É i, j É n, the minor determined by the first k
rows and the ith row and the first k columns and the jth
column.
det D (ikj )
Then for i, j > k we have a(ikj ) = .
det D (k)

Proof. In the process of Gaussian elimination, the determinants

of the matrices D (k) and D (ikj ) do not change: they are the same for
A (k) as for A. But in A (k) , both matrices are upper triangular.
Denoting the elements on their main diagonal by d 1 , . . . , d k+1 , a(ikj ) ,
we have
det D (k) = d 1 · · · d k+1 ,

det D (ikj ) = d 1 · · · d k+1 · a(ikj ) .
Divide these two equations by each other.

The theorem shows that if we always cancel (using the

Euclidean algorithm) our algorithm is polynomial.
There is a cheaper way than doing complete cancellation (see
exact-Gauss.pdf).
There is also a way to avoid working with fractions altogether:
modular computation. Se for example the Lovász lecture
notes.

Issues of rounding Pivoting and scaling
When rounding is unavoidable (reading)
Floating point: 0.235 · 105 (3 digits precision)

Complete pivoting: experts generally do not advise it.
Considerations of fill-in are typically given preference over
considerations of round-off errors, since if the matrix is huge and
sparse, we may not be able to carry out the computations at all if
there is too much fill-in.

Example
0.0001x + y=1
(4)
0.5x + 0.5y = 1
Eliminate x : −4, 999.5y = −4999.

Rounding to 3 significant digits:
−5, 000y = −5, 000

y= 1
x= 0
True solution: y = 0.999899, rounds to 1, x = 1, 0001, rounds to 1.

We get the true solution by choosing the second equation for
pivoting, rather than the first equation.

Forward error analysis: comparing the solution with the true

solution.
We can make our solutions look better introducing backward
error analysis: showing that our solution solves precisely a
system that differs only a little from the original.

Frequently, partial pivoting (choosing the pivot element just in

the k-th column) is sufficient to find a good solution in terms of
forward error analysis. However:
Example
x + 10, 000y = 10, 000

(5)
0.5x + 0.5y = 1
Choosing the first equation for pivoting seems OK. Eliminate x

from the second eq:
−5000.5y = −4, 999

y= 1 after rounding
x= 0

This is wrong even if we do backward error analysis: every

system
a 11 x + a 12 y = 10, 000
a 21 x + a 22 y = 1
satisfied by x = 0, y = 1 must have a 22 = 1.

The problem is that our system is not well scaled. Row scaling
and column scaling:
X
ri ai j s j x j = ri bi
ij
where r i , s j are powers of 10. Equilibration: we can always

achieve
0.1 < max | r i a i j s j | É 1,

j
0.1 < max | r i a i j s j | É 1.

i
Example
In (5), let r 1 = 10−4 , all other coeffs are 1: We get back (4), which
we solve by partial pivoting as before.

Sometimes, like here, there are several ways to scale, and not all
are good.
Example
Choose s 2 = 10−4 , all other coeffs 1:
x+ y0 = 10, 000
0.5x + 0.00005y0 = 1
(We could have gotten this system to start with. . . .) Eliminate x

from the second equation:
−0.49995y0 = −4999
y0 = 10000 after rounding
x= 0
so, we again got the bad solution.
Fortunately, such pathological systems are rare in practice.

Inverting matrices
Inverting matrices
Computing matrix inverse from an LUP decomposition:

solving equations
A X i = ei, i = 1, . . . , n.
Inverting a diagonal matrix: (d 1 , . . . , d n )−1 = (d 1−1 , . . . , d −1

n ).
¡B 0 ¢ ¡B 0 ¢ I 0 ³ ´
Inverting a matrix L = C D = 0 D D −1 C I
: We have
B−1 B−1
µ ¶µ ¶ µ ¶
−1 I 0 0 0
L = −1 = .
−D −1 C I 0 D −D CB−1
−1
D −1
¡B C
¢
For an³upper triangular
´
matrix U = 0 D we get similarly
U −1
= B−1 −B−1 CD −1 .
0 D −1

Inverting matrices
Theorem
Multiplication is no harder than inversion.
Proof. Let
I 0 0 I 0 0 I 0 0
    
D = L1 L2 =  A I 0 =  A I 0 0 I 0 .
0 B I 0 0 I 0 B I
Its inverse is
I 0 0 I 0 0 I 0 0
    
D −1 = L−
2
1 −1 
L 1 = 0 I 0  − A I 0 =  − A I 0 .
0 −B I 0 0 I AB −B I

Inverting matrices
Theorem
Inversion is no harder than multiplication.
Let n be power
³ of 2.´ Assume first that A is symmetric, positive
T
B C
definite, A = C D . Trying a block version of the LU
decomposition:
CT
µ ¶µ ¶
I 0 B
A= .
CB−1 I 0 D − CB−1 C T
Define Q = B−1 C T , and define the Schur complement as

S = D − CQ. We will see later that it is positive definite, so it has
an inverse.

Inverting matrices
³ ´³ ´
I 0 C T . By the inversion of triangular
We have A = Q T I B 0 S
matrices learned before:
¶−1 ¶ µ −1
CT B−1 −B−1 C T S −1 −QS −1
µ µ ¶
B B
= = ,
0 S 0 S −1 0 S −1
0 B−1 −QS −1 B + QS −1 Q T
¶ µ −1
−QS −1
µ ¶µ ¶
−1 I
A = = .
−Q T I 0 S −1 −S −1 Q T S −1

Inverting matrices
4 multiplications of size n/2 matrices
Q = B−1 C T , QT CT , S −1 Q T , Q(S −1 Q T ),
further 2 inversions and c · n2 additions:
I(2n) É 2I(n) + 4M(n) + c 1 n2 = 2I(n) + F(n),

I(4n) É 4I(n) + F(2n) + 2F(n),
I(2k ) É 2k I(1) + F(2k−1 ) + 2F(2k−2 ) + · · · + 2k−1 F(1).

Inverting matrices
Assume F(n) É c 2 n b with b > 1. Then
F(2k− i )2 i É c 2 2bk−bi+ i = 2bk 2−(b−1) i .
So,
I(2k ) É 2k I(1) + c 2 2b(k−1) (1 + 2−(b−1) + 2−2(b−1) + · · · )

< 2k + c 2 2b(k−1) /(1 − 2−(b−1) ).
Inverting an arbitrary matrix: A −1 = (A T A)−1 A T .

Inverting matrices
Proposition
The Schur complement is positive definite.
Proof.
BT
µ ¶µ ¶
T AT y
(y , z ) = yT A y + yT B T z + z T B y + z T Cz
B C z
= (y + A −1 B T z)T A(y + A −1 B T z) + z T (C − BA −1 B T )z.
For any z you can choose y to make the first term 0.

Least squares approximation
Least squares approximation (reading)
Data: (x1 , y1 ), . . . , (xm , ym ).

Fitting F(x) = c 1 f 1 (x) + · · · + c n f n (x).
It is reasonable to choose n much smaller than m (noise).
 
f 1 (x1 ) f 2 (x1 ) . . . f n (x1 )
 f 1 (x2 ) f 2 (x2 ) . . . f n (x2 ) 
 
A=  .. .. .. ..  .
 . . . . 
f 1 (xm ) f 2 (xm ) . . . f n (xm )
Equation Ac = y, generally unsolvable in the variable c. We want

to minimize the error η = Ac − y. Look at the subspace V of
vectors of the form Ac. In V , we want to find c for which Ac is
closest to y.

Least squares approximation
Then Ac is the projection of y to to V , with the property that

Ac − y is orthogonal to every vector of the form Ax:
(Ac − y)T Ax = 0 for all x, so

T
(Ac − y) A = 0
A T (Ac − y) = 0
The equation A T Ac = A T y is called the normal equation,

solvable by LU decomposition.
Explicit solution: Assume that A has full column rank, then A T A
is positive definite.
c = (A T A)−1 A T y. Here (A T A)−1 A T is called the pseudo-inverse
of A.

Linear Programming Problem definition
Linear programming
How about solving a system of linear inequalities?
Ax É b.
We will try to solve a seemingly more general problem:
maximize cT x
subject to Ax É b.
This optimization problem is called a linear program. (Not

program in the computer programming sense.)

Example
Three voting districts: urban, suburban, rural.
Votes needed: 50,000, 100,000, 25,000.
Issues: build roads, gun control, farm subsidies, gasoline tax.
Votes gained, if you spend $ 1000 on advertising on any of these
issues:
adv. spent policy urban suburban rural

x1 build roads −2 5 3
x2 gun control 8 2 −5
x3 farm subsidies 0 0 10
x4 gasoline tax 10 0 −2
votes needed 50, 000 100, 000 25, 000
Minimize the advertising budget (x1 + · · · + x4 ) · 1000.

The linear programming problem:
minimize x1 + x2 + x3 + x4
subject to −2x1 + 8x2 + 10x4 Ê 50, 000
5x1 + 2x2 Ê 100, 000
3x1 − 5x2 + 10x3 − 2x4 Ê 25, 000
Implicit inequalities: x i Ê 0.

Linear Programming Solution idea
Two-dimensional example
maximize x1 + x2
subject to 4x1 − x2 É 8
2x1 + x2 É 10
5x1 − 2x2 Ê −2
x1 , x2 Ê 0
Graphical representation, see book.

Convex polyhedron, extremal points.
The simplex algorithm: moving from an extremal point to a
nearby one (changing only two inequalities) in such a way that
the objective function keeps increasing.

Linear Programming Solution idea
Worry: there may be too many extremal points. For example, the
set of 2n inequalities
0 É x i É 1, i = 1, . . . , n
has 2n extremal points.

Linear Programming Standard and slack form
Standard and slack form
Standard form
maximize cT x
subject to Ax É b
xÊ 0
Objective function, constraints, nonnegativity constraints,

feasible solution, optimal solution, optimal objective value.
Unbounded: if the optimal objective value is infinite.
Converting into standard form:
x j = x0j − x00j , subject to x0j , x00j Ê 0.
Handling equality constraints.

Slack form
In the slack form, the only inequality constraints are
nonnegativity constraints. For this, we introduce slack variables
on the left:
n
X
x n+ i = b i − ai j x j.
j =1
In this form, they are also called basic variables. The objective
function does not depend on the basic variables. We denote its
value by z.

Example for the slack form notation:
z= 2x1 − 3x2 + 3x3

x4 = 7 − x1 − x2 + x3
x5 = −7 + x1 + x2 − x3
x6 = 4 − x1 + 2x2 − 2x3
More generally: B = set of indices of basic variables, |B| = m.

N = set of indices of nonbasic variables, | N | = n,
B ∪ N = {1, . . . , m + n}. The slack form is given by (N, B, A, b, c, v):
P
z= v+ c x
P j∈N j j
x i = b i − j∈N a i j x j for i ∈ B.
Note that these equations are always independent.

Linear Programming Formulating problems as linear programs
Single-source shortest paths
(Maximization is counter-intuitive, but the book is wrong.)
maximize d[t]
subject to d[v] É d[u] + w(u, v) for each edge (u, v)
d[s] Ê 0

Maximum flow
Capacity c(u, v) Ê 0.
P
maximize f (s, v)
v
subject to f (u, v) É c(u, v)
f (u, v) = − f (v, u)
P
v f (u, v) = 0 for u ∈ V − { s, t}
The matching problem.

Given m workers and n jobs, and a graph connecting each worker
with some jobs he is capable of performing. Goal: to connect the
maximum number of workers with distinct jobs.
This can be reduced to a maximum flow problem (see homework
and book).

Minimum-cost flow
Edge cost a(u, v). Send d units of flow from s to t and minimize
the total cost
X
a(u, v) f (u, v).
u,v
Multicommodity flow
k different commodities K i = (s i , t i , d i ), where d i is the demand.
The capacities constrain the aggregate flow. There is nothing to
optimize: just determine the feasibility.

Games
A zero-sum two-person game is played between player 1 and

player 2 and defined by an m × n matrix A. We say that if player
1 chooses a pure strategy i ∈ {1, . . . , m} and player 2 chooses pure
strategy j ∈ {1, . . . , n} then there is payoff: player 2 pays amount
a i j to player 1.
Example
m = n = 2, pure strategies {1, 2} are called “attack left”, “attack
right” for player 1 and “defend left”, “defend right” for player 2.
The matrix is
µ ¶
−1 1
A= .
1 −1

Mixed strategy: a probability distribution over pure strategies.

p = (p 1 , . . . , p m ) for player 1 and q = (q 1 , . . . , q m ) for player 2.
P
Expected payoff: i j a i j p i q j .
If player 1 knows the mixed strategy q of player 2, he will want to
achieve
X X X
max pi a i j q j = max ai j q j
p i
i j j
since a pure strategy always achieves the maximum. Player 2

wants to minimize this and can indeed achieve
X
min max ai j q j.
q i j

Rewritten as a linear programming problem:
minimize t
P
subject to t Ê j ai j q j, i = 1, . . . , m
q Ê 0, j = 1, . . . , n
P j
j qj = 1.

Linear Programming The simplex algorithm
The simplex algorithm

Slack form. Example:
z= 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 + 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
A basic solution: set each nonbasic variable to 0. Since all b i

are positive, the basic solution is feasible here.
Iteration step: Increase x1 until one of the constraints
becomes tight: now, this is x6 since b i /a i1 is minimal for i = 6.
Pivot operation: exchange x6 for x1 .
x1 = 9 − x2 /4 − x3 /2 − x6 /4
Here, x1 is the entering variable, x6 the leaving variable.

If not possible, are we done? See later.
In general:
Lemma
The slack form is uniquely determined by the set of basic variables.
Proof. Simple, using the uniqueness of linear forms.
This is useful, since the matrix is therefore only needed for

deciding how to continue. We might have other ways to decide
this.
Assume that there is a basic feasible solution. See later how
to find one.

Rewrite all other equations, substituting this x1 :
z = 27 + x2 /4 + x3 /2 − 3x6 /4
x1 = 9 − x2 /4 − x3 /2 − x6 /4
x4 = 21 − 3x2 /4 − 5x3 /2 + x6 /4
x5 = 6 − 3x2 /2 − 4x3 + x6 /2
Formal pivot algorithm: no surprise.

When can we not pivot?

unbounded case
optimality
The problem of cycling Can be solved, though you will not
encounter it in practice.
Perturbation, or “Bland’s Rule”: choose variable with the
smallest index. (No proof here that this terminates.)
Geometric meaning: walking around a fixed extremal point,
trying different edges on which we can leave it while increasing
the objective.

Initial basic feasible solution
Solve the following auxiliary problem, with an additional variable

x0 :
minimize x0
T
subject to a i x − x0 Éb i = 1, . . . , m,
x, x0 Ê0
If the optimal x0 is 0 then the optimal basic feasible solution is a

basic feasible solution to the original problem.

Complexity of the simplex method
Each pivot step takes O(mn) algebraic operations.

How many pivot steps? Can be exponential.
Does not occur in practice, where the number of needed
iterations is rarely higher than 3 max(m, n). Does not occur on
“random” problems, but mathematically random problems are
not typical in practice.
Spielman-Teng: on a small random perturbation of a linear
program (a certain version of) the simplex algorithm
terminates in polynomial time (on average).
There exists also a polynomial algorithm for solving linear
programs (see later). It is rarely competitive in practice.

Linear Programming Duality
Duality
Primal (standard form): maximize c T x subject to Ax É b and

x Ê 0. Value of the optimum (if feasible): z∗ . Dual:
AT y Ê c yT A Ê c T
yÊ0 yT Ê 0
T
min b y min yT b
Value of the optimum if feasible: t∗ .
Proposition (Weak duality)

z∗ É t∗ , moreover for every pair of feasible solutions x, y of the
primal and dual:
c T x É yT Ax É yT b = b T y. (6)

Use of duality. If somebody offers you a feasible solution to the

dual, you can use it to upperbound the optimum of the primal
(and for example decide that it is not worth continuing the
simplex iterations).

Interpretation:
b i = the total amount of resource i that you have (kinds of
workers, land, machines).
a i j = the amount of resource i needed for activity j.
c j = the income from a unit of activity j.
x j = amount of activity j.
Ax É b says that you can use only the resources you have.
Primal problem: maximize the income c T x achievable with the
given resources.
Dual problem: Suppose that you can buy lacking resources and
sell unused resources.

Resource i has price yi . Total income:
L(x, y) = c T x + yT (b − Ax) = (c T − yT A)x + yT b.
Let
f ( x̂) = inf L( x̂, y) É L( x̂, ŷ) É sup L(x, ŷ) = g( ŷ).

yÊ0 xÊ0
Then f (x) > −∞ needs Ax É b. Hence if the primal is feasible

then for the optimal x∗ (choosing y to make yT (b − Ax∗ ) = 0) we
have
sup f (x) = c T x∗ = z∗ .
x
Similarly g(y) < ∞ needs c T É yT A, hence if the dual is feasible

then we have
z∗ É inf g(y) = (y∗ )T b = t∗ .

y

Complementary slackness conditions:
yT (b − Ax) = 0, (yT A − c T )x = 0.
Proposition
Equality of the primal and dual optima implies complementary
slackness.
Interpretation:
Inactive constraints have shadow price yi = 0.
Activities that do not yield the income required by shadow
prices have level x j = 0.

Theorem (Strong duality)

The primal problem has an optimum if and only if the dual is
feasible, and we have
z∗ = max c T x = min yT b = t∗ .
This surprising theorem says that there is a set of prices (called

shadow prices) which will force you to use your resources
optimally.
Many interesting uses and interpretations, and many proofs.

Our proof of strong duality uses the following result of the

analysis of the simplex algorithm.
Theorem
If there is an optimum v then there is a basis B ⊂ {1, . . . , m + n}
belonging to a basic feasible solution, and coefficients c̃ i É 0 such
that
c T x = v + c̃ T x,
where c̃ i = 0 for i ∈ B.
Define the nonnegative variables
ỹi = − c̃ n+ i i = 1, . . . , m.

For any x, the following transformation holds, where i = 1, . . . , m,

j = 1, . . . , n:
X X X
c jxj = v + c̃ j x j + c̃ n+ i xn+ i
j j i
X X X
= v+ c̃ j x j + (− ỹi )(b i − ai j x j)
j i j
X X X
= v− b i ỹi + ( c̃ j + a i j ỹi )x j .
i j i
P P
This is an identity for x, so v = i b i ỹi , and also c j = c̃ j + i a i j ỹi .
Optimality implies c̃ j É 0, which implies that ỹi is a feasible
solution of the dual.

Linear programming and linear inequalities
Any feasible solution of the set of inequalities
Ax Éb
AT y Ê c
cT x − bT y = 0
x, yÊ 0
gives an optimal solution to the original linear programming

problem.

Linear Programming Alternatives
Theory of alternatives
Theorem (Farkas Lemma, not as in the book)

A set of inequalities Ax É b is unsolvable if and only if a positive
linear combination gives a contradiction: there is a solution y Ê 0
to the inequalities
yT A = 0,
yT b < 0.
For proof, translate the problem to finding an initial feasible

solution to standard linear programming.

We use the homework allowing variables without nonnegativity

constraints:
maximize z
(7)
subject to Ax + z · e É b
Here, e is the vector consisting of all 1’s. The dual is
minimize yT b
subject to yT A = 0
(8)
yT e = 1
yT Ê 0
The original problem has no feasible solution if and only if

max z < 0 in (7). In this case, min yT b < 0 in (8). (Condition
yT e = 1 is not needed.)

Separating hyperplane
Vectors u1 , . . . , u m in an n-dimensional space. Let L be the set of
convex linear combinations of these points: v is in L if
X X
yi u i = v, yi = 1, y Ê 0.
j i
Using matrix U with rows u Ti :
yT U = v T ,
X
yi = 1, y Ê 0. (9)
i
If v 6∈ L then we can put between L and v a hyperplane with

equation d T v = c. Writing x in place of d and z in place of c, this
says that the following set of inequalities has a solution for x, z:
u Ti x É z (i = 1, . . . , m), vT x > z.
Can be derived from the Farkas Lemma.

Linear Programming Applications of duality
Application to games
Primal, with dual variables written in parentheses at end of lines:
minimize t
P
subject to t − j ai j q j Ê 0 i = 1, . . . , m (p i )
P
j q j = 1, (z)
q j Ê 0, j = 1, . . . , n
Dual:
maximize z
P
subject to i pi = 1,
P
− i ai j pi + z É 0, j = 1, . . . , n
pi Ê 0 i = 1, . . . , m.

Dual for max-flow: min-cut
P
maximize v∈V f (s, v)
subject to f (u, v) É c(u, v), u, v ∈ V ,
f (u, v) = − f (v, u), u, v ∈ V ,
P
v∈V f (u, v) = 0, u ∈ V \ { s, t}.
Two variables associated with each edge, f (u, v) and f (v, u).
Simplify. Order the points arbitrarily, but starting with s and
ending with t. Leave f (u, v) when u < v: whenever f (v, u) appears
with u < v, replace with − f (u, v).

P
maximize f (s, v)
v> s
subject to f (u, v) É c(u, v), u < v,
− f (u, v) É c(v, u), u < v,
P P
v> u f (u, v) − v< u f (v, u) = 0, u ∈ V \ { s, t}.
Some constraints disappeared but others appeared, since in case

of u < v the constraint f (v, u) É c(v, u) is written now
− f (u, v) É c(u, v).
A dual variable for each constraint. For f (u, v) É c(u, v), call it
y+ (u, v), for − f (u, v) É c(u, v), call it y− (y, v). For
X X
f (u, v) − f (v, u) = 0
v> u v< u
call it y(u).

Dual constraint for each primal variable f (u, v), u < v. Since
f (u, v) is not restricted by sign, the dual constraint is an
equation. If u, v 6= s then f (u, v) has coefficient 0 in the objective
function. Let
y(u, v) = y+ (u, v) − y− (u, v).
The equation for u 6= s, v 6= t is y+ (u, v) − y− (u, v) + y(u) − y(v) = 0, or
y(u, v) = y(v) − y(u).
For u = s, v 6= t: y+ (s, v) − y− (s, v) − y(v) = 1, or
y(s, v) = y(v) − (−1).
For u 6= s but v = t, y+ (u, t) − y− (u, t) + y(u) = 0, or
y(u, t) = 0 − y(u).

For u = s, v = t: y+ (s, t) − y− (s, t) = 1, or
y(s, t) = 0 − (−1).
Setting y(s) = −1, y(t) = 0, all these equations can be summarized

in y(u, v) = y(v) − y(u) for all u, v.
The objective function is u,v c(u, v)(y+ (u, v) + y− (u, v)).
P
The maximum of any x+ + x− subject to x+ , x− Ê 0, x+ − x− = a is

|a|, so the objective function can be simplified to
P
u,v c(u, v)| y(u, v)|. Simplified dual problem:
P
minimize u<v c(u, v)| y(v) − y(u)|
subject to y(s) = −1, y(t) = 0.
Let us require y(s) = 0, y(t) = 1 instead; the problem remains the

same.

Claim
There is an optimal solution in which each y(u) is 0 or 1.
Proof. Assume that there is an y(u) that is not 0 or 1. If it is

outside the interval [0, 1] then moving it towards this interval
decreases the objective function, so assume they are all inside. If
there are some variables y(u) inside this interval then move them
all by the same amount either up or down until one of them hits 0
or 1. One of these two possible moves will not increase the
objective function. Repeat these actions until each y(u) is 0 or
1.

Let y be an optimal solution in which each y(u) is either 0 or 1.

Let
S = { u : y(u) = 0 }, T = { u : y(u) = 1 }.
Then s ∈ S, t ∈ T. The objective function is
X
c(u, v).
u∈S,v∈T
This is the value of the “cut” (S, T). So the dual problem is about
finding a minimum cut, and the duality theorem implies the
max-flow/min-cut theorem.

Maximum bipartite matching

Bipartite graph with left set A, right set B and edges E ⊆ A × B.
Interpretation: elements of A are workers, elements of B are jobs.
(a, b) ∈ E means that worker a has the skill to perform job b. Two
edges are disjoint if both of their endpoints differ. Matching: a set
M of disjoint edges. Maximum matching: a maximum-size
assignment of workerst to jobs.
Covering set C ⊆ A ∪ B: a set with the property that for each edge
(a, b) ∈ E we have a ∈ C or b ∈ C.
Clearly, the size of each matching is É the size of each covering
set.
Theorem
The size of a maximum matching is equal to the size of a
minimum covering set.
There is a proof by reduction to the flow problem and using the

max-flow min-cut theorem.
The ellipsoid algorithm The problem
The ellipsoid algorithm

The problem
The simplex algorithm may take an exponential number of

steps, as a function of m + n.
Consider just the problem of solving a set of inequalities
aTi x É b i , i = 1, . . . , m
for x ∈ Rn . If each entry has at most k digits then the size of

the input is
L = m · n · k.
We want a solution (or learn that none exists) in a number of

steps polynomial in L, that is O(L c ) for some constant c.

The ellipsoid algorithm Ellipsoids
Ellipsoids
In space Rn , for all r > 0 the set
B(c, r) = { x : (x − c)T (x − c) É r 2 }
is a ball with center c and radius r. A nonsingular linear

transformation L transforms B(0, r) into an ellipsoid
E = { Lx : xT x É r 2 } = { y : yT A −1 y É r 2 },
where A = L T L is positive definite. A general ellipsoid E(c, A)

with center c has the form
{ x : (x − c)T A −1 (x − c) É r 2 }
where A is positive definite.

Though we will not use it substantially, the following theorem

shows that ellipsoids can always be brought to a simple form. A
basis b1 , . . . , b n of the vector space Rn is called orthonormal if
b Ti b j = 0 for i 6= j and 1 for i = j.
Theorem (Principal axes)

Let E be an ellipsoid with center 0. Then there is an orthonormal
basis such that if vectors are expressed with coordinates in this
basis then
E = { x : xT A −2 x É 1 },
where A is a diagonal matrix with positive elements a 1 , . . . , a n on

the diagonal.
© x2 x2 ª
In other words, E = x : a12 + · · · + a2n É 1 .
1 n

In 2 dimensions this gives the familiar equation of the ellipse
x2 y2
+ = 1.
a2 b 2
The numbers a, b are the lenghts of the principal axes of the
ellipse, measured from the center. When they are all equal, we
get the equation of a circle (sphere in n dimensions).

Volume of an ellipsoid
Let Vn be the volume of a unit ball in n dimensions. It is easy to

see that the volume of the ellipsoid
x2 x2
E = x : 12 + · · · + 2n É 1 .
© ª
a1 an
is Vol(E) = Vn a 1 a 2 · · · a n . More generally, if

E = { x : xT (A A T )−1 x É 1 } then Vol(E) = Vn det A.

The ellipsoid algorithm Upper and lower bounds
Bounding the set of solutions
The set of solutions is a (possibly empty) polyhedron P. Let
1 δ
N = n n/2 102kn , δ= , ε= ,
2mN 10k n
b0i = b i + δ.
In preparation, we will show
Theorem
p
a There is a ball E 1 of radius É N n and center 0 with the
property that if there is a solution then there is a solution in
E1.
b Ax É b is solvable if and only if Ax É b0 is solvable and its set
of solutions of contains a ball of radius ε.

Consider the upper bound first. We have seen in homework the

following:
Lemma
If there is a solution then there is one with | x j | É N for all j.
This implies a .
Now for the lower bound. The coming homework has a problem
showing the following theorem, with
Lemma
If Ax É b has no solution then defining b0i = b i + δ, the system
Ax É b0 has no solution either.

The following clearly implies b of the theorem:
Corollary
If Ax É b0 is solvable then its set of solutions contains a cube of
size 2ε.
Proof. If Ax É b0 is solvable then so is Ax É b. Let x be a

solution of Ax É b. Then changing each x j by any amount of
absolute value at most ε changes
n
aTi x =
X
ai j x j
j =1
by at most 10k nε É δ, so each inequality aTi x É b0i still holds.

The ellipsoid algorithm The algorithm
The algorithm
The algorithm will go through a series x(1) , x(2) , . . . of trial

solutions, and in step t learn P ⊆ E t where our wraps
E 1 , E 2 , . . . are ellipsoids.
We start with x(1) = 0, the center of our ball. Is it a solution? If
not, there is an i with aTi x(1) > b i . Then P is contained in the
half-ball
H1 = E 1 ∩ { x : aTi x É aTi x(1) }.

The ellipsoid algorithm Shrinking rate
Shrinking rate
To keep our wraps simple, we enclose H1 into an ellipsoid E 2 of

possibly small volume.
Lemma
There is an ellipsoid E 2
containing H1 with
1
Vol(E 2 ) É e− 2n Vol(E 1 ). This is
x(1) x(2) true even if E 1 was also an
ellipsoid.
1
Note e− 2n ≈ 1 − 21n .

Proof
Assume without loss of generality

E 1 is the unit ball E 1 = { x : xT x É 1 },
a i = − e1 , b i < 0.
Then the half-ball to consider is { x ∈ E 1 : x1 Ê 0 }. The best
ellipsoid’s center has the form (d, 0, . . . , 0)T . The axes will be
(1 − d), b, b, . . . , b, so
© (x1 − d)2 −2
X 2 ª
E2 = x : + b x j É 1 .
(1 − d)2 j Ê2
P 2
It touches the ball E 1 at the circle x1 = 0, j Ê2 x j = 1:
d2
+ b−2 = 1.
(1 − d)2

Hence
d2 1 − 2d
b−2 = 1 − 2
= ,
(1 − d) 1 − 2d + d 2
d2
b2 = 1 + É 1 + 2d 2 if d É 1/4.
1 − 2d
Using 1 + z É e z :
2
Vol(E 2 ) = Vn (1 − d)b n−1 É Vn (1 − d)(1 + 2d 2 )n/2 É Vn e nd −d
.
1
Choose d = 21n , then this is Vn e− 2n .
This proves the Lemma for the case when E 1 is a ball. When E 1
is an ellipsoid, transform it linearly into a ball, apply the lemma
and then transform back. The transformation takes ellipsoids
into ellipsoids and does not change the ratio of volumes.

The ellipsoid algorithm Bounding the number of iterations
Bounding the number of iterations

Now the algorithm constructs E 3 from E 2 in the same way, and so
on. If no solution is found, then r steps diminish the volume by a
factor
r
e− 2n .
p
We know Vol(E 1 ) É Vn (N n)n , while if there is a solution then
the set of solutions contains a ball of volume Ê Vn εn . But if r is so
large that
¶n
ε
µ
− 2rn
e < p
N n
then Vol(E r+1 ) is smaller than the volume of this small ball, so
there is no solution.
It is easy to see from here that r can be chosen to be polynomial
in m, n, k.
NP-completeness
NP problems
Examples
Shortest vs. longest simple paths
Euler tour vs. Hamiltonian cycle
2-SAT vs. 3-SAT. Satisfiability for circuits and for conjunctive
normal form (SAT). Reducing sastisfiability for circuits to
3-SAT.
Use of reduction in this course: proving hardness.
Ultrasound test of sex of fetus.

NP-completeness
Decision problems vs. optimization problems vs. search problems.
Example
Given a graph G.
Decision Given k, does G have an independent subset of size Ê k?
Optimization What is the size of the largest independent set?
Search Given k, give an independent set of size k (if there is one).
Optimization+search Give a maximum size independent set.

NP-completeness Polynomial time
Random access machine
Memory: one-way infinite tape: cell i contains natural number

T[i] of arbitrary size.
Program: a sequence of instructions, in the “program store”: a
(potentially) infinite sequence of labeled registers containing
instructions. A program counter.
Instruction types:
T[T[i]] = T[T[ j]] random access
T[i] = T[ j] ± T[k] addition
if T[0] > 0 then jump to s conditional branching
The cost of an operation will be taken to be proportional to the
total length of the numbers participating in it. This keeps the cost
realistic despite the arbitrary size of numbers in the registers.

NP-completeness Polynomial time
Polynomial time
Abstract problems
Instance. Solution.
Encodings
Concrete problems: encoded into strings.
Polynomial-time computable functions, polynomial-time
decidable sets.
Polynomially related encodings.
Language: a set of strings. Deciding a language.

NP-completeness Polynomial-time verification
Polynomial-time verification
Example
Hamiltonian cycles.
An NP problem is defined with the help of a polynomial-time

function
V (x, w)
with yes/no values that verifies, for a given input x and witness
(certificate) w whether w is indeed witness for x.

NP-completeness Polynomial-time verification
The same decision problem may belong to very different

verification functions (search problems).
Example (Compositeness)
Let the decision problem be the question whether a number x is
composite (nonprime). The obvious verifiable property is
V1 (x, w) ⇔ (1 < w < x) ∧ (w| x).
There is also a very different verifiable property V2 (x, w) for

compositeness such that, for a certain polynomial-time
computable b(x), if x is composite then at least half of the
numbers 1 É w É b(x) are witnesses. This can be used for
probabilistic prime number tests.

NP-completeness Reducibility, completeness
Reducibility, completeness
Reduction of problem A 1 to problem A 2 in terms of the

verification functions V1 , V2 and a reduction (translation)
function τ:
∃wV1 (x, w) ⇔ ∃ uV2 (τ(x), u).
Example
Reducing linear programming to solving a set of linear
inequalities.
NP-hardness.
NP-completeness.

NP-completeness Circuit satisfiability and independent set
Theorem
Satisfiability is NP-complete.
Proof via circuit satisfiability.
Theorem
INDEPENDENT SET is NP-complete.
Reducing SAT to it.
Example
Integer linear programming. In particular, the subset sum
problem.
Reduction of 3SAT to subset sum.

Example
Set cover Ê vertex cover ∼ independent set.

Approximations
Approximations
The setting
In case of NP-complete problems, maybe something can be said

about how well we can approximate a solution. We will formulate
the question only for problems, where we maximize a positive
function. For object function f (x, y) for x, y ∈ {0, 1}n , the optimum
is
M(x) = max f (x, y)
y
where y runs over the possible “witnesses”.

For 0 < λ, an algorithm A(x) is a λ-approximation if
f (x, A(x)) > M(x)/λ.
For minimization problems, with minimum m(x), we require

f (x, A(x)) < M(x)λ.

Approximations Greedy algorithms
Greedy algorithms
Try local improvements as long as you can.
Example (Maximum cut)
Graph G = (V , E), cut S ⊆ V , S = V \ S. Find cut S that maximizes
the number of edges in the cut:
|{ { u, v} ∈ E : u ∈ S, v ∈ S }|.
Greedy algorithm:
Repeat: find a point on one side of the cut whose
moving to the other side increases the cutsize.
Theorem
If you cannot improve anymore with this algorithm then you are
within a factor 2 of the optimum.
Proof. The unimprovable cut contains

Péter Gács (Boston University)CS 530 at least half of all Spring 09 137 / 165
Randomized algorithms
Generalize maximum cut for the case where edges e have weights
w e , that is maximize
X
wuv .
u∈S,v∈S
Question The greedy algorithm brings within factor 2 of the

optimum also in the weighted case. But does it take a
polynomial number of steps?
New idea: decide each “v ∈ S?” question by tossing a coin. The
P
expected weight of the cut is 12 e w e , since each edge is in the
cut with probability 1/2.
We will do better with semidefinite programming.

Less greed is sometimes better
What does the greedy algorithm for vertex cover say?

The following, less greedy algorithm has better performance
guarantee.
Approx_Vertex_Cover (G):
C←;
E 0 ← E[G]
while E 0 6= ; do
let (u, v) be an arbitrary edge in E 0
C ← C ∪ { u, v}
remove from E 0 every edge incident on either u or v
return C

Theorem
Approx_Vertex_Cover has a ratio bound of 2.
Proof. The points of C are endpoints of a matching. Any

optimum vertex cover must contain half of them.

More general vertex cover problem for G = (V , E), with weight w i

in vertex i. Let x i = 1 if vertex x is selected. Linear programming
problem without the integrality condition:
minimize wT x
subject to x i + x j Ê 1, (i, j) ∈ E,
x Ê 0.
Let the optimal solution be x∗ . Choose x i = 1 if x∗i Ê 1/2 and 0

otherwise.
Claim
Solution x has approximation ratio 2.
Proof. We increased each x∗i by at most a factor of 2.

Approximations The set-covering problem
The set-covering problem
Given (X , F ): a set X and a family F of subsets of X , find a

min-size subset of F covering X .
Example: Smallest committee with people covering all skills.
Generalization: Set S has weight w(S) > 0. We want a
minimum-weight set cover.

The algorithm Greedy_Set_Cover (X , F ):

U←X
C ←;
while U 6= ; do
select an S ∈ F that maximizes |S ∩ U |/w(S)
U ←U \S
C ← C ∪ {S }
return C
If element e was covered by set S then let price(e) = Sw∩(SU) . Then

we cover each element at minimum price (at the moment).
Note that the total final weight is nk=1 price(e k ).
P

Analysis
Let H(n) = 1 + 1/2 + · · · + 1/n(≈ ln n).
Theorem
Greedy_Set_Cover has a ratio bound maxS ∈F H(|S |).

Lemma
For all S in F we have
P
e∈S price(e) É w(S)H(|S |).
Proof. Let e ∈ S ∩ S i \ j< i S j , and Vi = S \ j< i S j be the

S S
remaining part of S before being covered in the greedy cover. By

the greedy property,
price(e) É w(S)/|Vi |.
Let e 1 , . . . , e |S | be a list of elements in the order in which they are

covered (ties are broken arbitrarily). Then the above inequality
implies
w(S)
price(e k ) É .
|S | − k + 1
Summing for all k proves the lemma.

Proof of the theorem. Let C ∗ be the optimal set cover and C the
cover returned by the algorithm.
w(S)H(|S |) É H(|S ∗ |)
X X X X X
price(e) É price(e) É w(S)
e S ∈C ∗ e∈S S ∈C ∗ S ∈C ∗
where S ∗ is the largest set.
Question
Is this the best possible factor for set cover?
The answer is not known.

Approximations Approximation schemes
Approximation scheme
An algorithm that for every ε, gives an (1 + ε)-approximation.

A problem is fully approximable if it has a polynomial-time
approximation scheme.
Example: see a version KNAPSACK below.
It is partly approximable if there is a lower bound λmin > 1 on
the achievable approximation ratio.
Example: MAXIMUM CUT, VERTEX COVER, MAX-SAT.
It is inapproximable if even this cannot be achieved.
Example: INDEPENDENT SET (deep result). The
approximation status of this problem is different from
VERTEX COVER, despite the close equivalence between the
two problems.

Fully approximable version of knapsack
Given: integers b Ê a 1 , . . . , a n , and integer weights w1 Ê . . . Ê wn .
maximize wT x
subject to aT x É b,
x i = 0, 1, i = 1, . . . , n.

Dynamic programming: For 1 É k É n,
A k (p) = min{ aT x : wT x = p, xk+1 = · · · = xn = 0 }.
If the set is empty the minimum is ∞. Let w = w1 + · · · + wn . The

vector (A k+1 (0), . . . , A k+1 (w)) can be computed by a simple
recursion from (A k (0), . . . , A k (w)). Namely, if wk+1 > p then
A k+1 (p) = A k (p). Otherwise,
A k+1 (p) = min{ A k (p), a k+1 + A k (p − wk+1 ) }.
The optimum is max{ p : A n (p) É b }.

Complexity: roughly O(nw) steps.
Why is this not a polynomial algorithm?

Idea for approximation: break each w i into a smaller number of

big chunks, and use dynamic programming. Let r > 0, w0i = bw i /r c.
maximize (w0 )T x
subject to aT x É b,
x i = 0, 1, i = 1, . . . , n.

For the optimal solution x0 of the changed problem, estimate

w T x0 w T x0
OPT = wT x∗ . We have
wT x0 /r Ê (w0 )T x0 Ê (w0 )T x∗ Ê (w/r)T x∗ − n,

wT x0 Ê OPT − r · n = OPT − εw1 ,
where we set r = εw1 /n. This gives
(w)T x0 ε w1
Ê 1− Ê 1 − ε.
OPT OPT
P
With w = i wi , the amount of time is of the order of
nw/r = n2 w/(w1 ε) É n3 /ε,
which is polynomial in n, (1/ε).

Look at the special case of knapsack, with w i = a i . Here, we just

want to fill up the knapsack as much as we can. This is
equivalent to minimizing the remainder,
aT x.
X
b−
i
But this minimization problem is inapproximable.

Convex programming Convexity
Convex programming
Convexity
Many methods and results of linear programming generalize to

the case when the set of feasible solutions is convex and there is a
convex function to minimize.
Definition
A function f : Rn → R is convex if the set { (x, y) : f (x) É y } is
convex. It is concave if − f (x) is convex.
Equivalently, f is convex if
f (λa + (1 − λ)b) É λ f (a) + (1 − λ) f (b)
holds for all 0 É λ É 1.

Examples
Each linear function aT x + b is convex.
If a matrix A is positive semidefinite then the quadratic
function xT Ax is convex.
If f (x), g(x) are convex and α, β Ê 0 then α f (x) + β g(x) is also
convex.
If f (x) is convex then for every constant c the set { x : f (x) É c } is a

convex set.

Definition
A convex program is an optimization problem of the form
min f 0 (x)
subject to f i (x) É 0 for i = 1, . . . , m,
where all functions f i for i = 0, . . . , m are convex.

More generally, we also allow constraints of the form
x∈H
for any convex set H given in some effective way.

Example: Support vector machine

Vectors u1 , . . . , u k represent persons known to have ADD
(attention deficit disorder). u i j = measurement value of the
jth psychological or medical test of person i. v1 , . . . , v l ∈ Rn
represent persons known not to have ADD.
Separate the two groups, if possible by a linear test find
vectors z, x < y with
z T u i É x for i = 1, . . . , k,
z T v i Ê y for i = 1, . . . , l.
y− x
For z, x, y to maximize the width of the gap ( z T z)1/2
, solve the
convex program:
maximize y− x
subject to u Ti z É x, i = 1, . . . , k,
vTi z Ê y, i = 1, . . . , l,
z T z É 1.
Convex programming Separation oracle
Separation oracle
For the definition of “given in an effective way”, take clue from

the ellipsoid algorithm:
We were looking for a solution to a system of linear
inequalities
aTi x É b i , i = 1, . . . , n.
A trial solution x( t) was always the center of some ellipsoid E t .

If it violated the conditions, it violated one of these:
aTi x( t) > b i . We could then use this to cut the ellipsoid E t in
half and to enclose it into a smaller ellipsoid E t+1 .
Now we are looking for an element of an arbitrary convex set
H. Assume again, that at step t, it is enclosed in an ellipsoid
E t , and we are checking the condition x( t) ∈ H. How to imitate
the ellipsoid algorithm further?

Definition
Let a : Qn → Qn , b : Qn → Q be functions computable in
polynomial time and H ⊆ Rn a (convex) set. These are a
separating (hyperplane) oracle for H if for all x ∈ Rn , with
a = a(x), b = b(x) we have:
If x ∈ H then a = 0.
If x 6∈ H then aT y É b for all y ∈ H and aT x Ê b.
Example
For the unit ball H = { x : xT x É 1 }, the functions a = x · | xT x − 1|+ ,
and b = xT x − 1 give a separation oracle.
To find a separation oracle for an ellipsoid, transform it into a ball
first.

If the goal is to find an element in a convex set H that allows a

separation oracle (a(·), b(·)) then we can use it to proceed with the
ellipsoid algorithm, enclosing the convex set H into ellipsoids of
smaller and smaller volume. This can frequently lead to good
approximation algoritms.

Convex programming Semidefinite programs
Semidefinite programs
If A, B are symmetric matrices then A ¹ B denotes that B − A

is positive semidefinite, and A ≺ B denotes that B − A is
positive definite.
Let the variables x i j be arranged in an n × n symmetric
matrix X = (x i j ). The set of positive semidefinite matrices
{ X : X º 0}
is convex. Indeed, it is defined by the set of linear inequalities
aT X a Ê 0, that is
X
(a i a j )x i j Ê 0
ij
where a runs through all vectors in Rn .

Example: maximum cut
Recall the maximum cut problem in a graph G = (V , E, w(·))

where w e is the weight of edge e.
New idea:
Assign a unit vector u i ∈ Rn to each vertex i ∈ V of the graph.
Choose a random direction through 0, that is a random unit
vector z. The sign of the projection on z determines the cut:
S = { i : z T u i É 0 }.
Computation shows that in order to maximize the expected

cut weight, we need to minimize
w i j u Ti u j .
X
i 6= j

This brings to the program:

P T
minimize i 6= j w i j u i u j
subject to u Ti u i = 1, i = 1, . . . , n.
It is more convenient to work with the variables x i j = u Ti u j . The

matrix X = (x i j ) is positive semidefinite, with x ii = 1, if and only if
it can be represented as x i j = u Ti u j . We arrive at the semidefinite
program:
P
minimize i j wi j xi j
subject to x ii = 1, i = 1, . . . , n,
X º 0.

Separation oracle for semidefiniteness
The LU decomposition algorithm, when the matrix A is

symmetric,
µ becomes¶ the Cholesky decomposition:
a 11 vT
For A = with U 1 = L1T :
v A0
µ ¶
1 −1 a 11 0
L−
1 AU 1 =
0 A2
with Schur’s complement A 2 = A 0 − vvT /a 11 .

Proposition
If A is positive definite then A 2 is also.
Proof. We have yT A 2 y = xT Ax, with

µ ¶
1 0
x = U−
1 y =: M 1 y.
I n−1
If y witnesses A 2 not positive definite by yT A 2 y É 0 then x = M 1 y

witnesses A not positive definite.

Separation oracle (d, b) for positive definiteness

b(A) = 0.
We will set d i j = x i x j where x witnesses A not positive
definite.
If the first step of the decomposition fails, that is a 11 É 0, then
set x = e1 .
If the recursive step of the decomposition fails, that is y
witnesses A 2 not positive definite by yT A 2 y É 0, then set
x = M 1 y.

cs530 09 Notes

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

cs530 09 Notes

Diunggah oleh

Hak Cipta:

Format Tersedia

Advanced algorithms

Freely using the textbook by Cormen, Leiserson, Rivest, Stein

Computer Science Department

Péter Gács (Boston University) CS 530 Spring 09 1 / 165

The class structure

See the course homepage.

Péter Gács (Boston University) CS 530 Spring 09 2 / 165

For us, a vector is always given by a finite sequence of numbers.

Péter Gács (Boston University) CS 530 Spring 09 3 / 165

Vector space over a field: a set M of vectors closed under linear

Péter Gács (Boston University) CS 530 Spring 09 4 / 165

Subspace. Generated subspace.

Basis in a subspace M: a maximal lin. indep. set.

Péter Gács (Boston University) CS 530 Spring 09 5 / 165

Proof. Via the exchange lemma.

Dimension of a vector space: this number.

Péter Gács (Boston University) CS 530 Spring 09 6 / 165

Let M be a vector space. If b i is an n-element basis, then each

The x i are called the coordinates of x with respect to this basis.

Péter Gács (Boston University) CS 530 Spring 09 7 / 165

(1, −1, 0, ..., 0)

form a basis of A (prove it!).

Péter Gács (Boston University) CS 530 Spring 09 8 / 165

Péter Gács (Boston University) CS 530 Spring 09 9 / 165

Matrix representing a linear map

A p × q matrix A can represent a linear map R q → R p as follows:

With column vectors x = (x i ), y = (y j ) and matrix A = (a i j ), this

This is taken as the definition of matrix-vector product.

Péter Gács (Boston University) CS 530 Spring 09 10 / 165

Let us also have

writeable as y = Bz. Then it can be computed that

Péter Gács (Boston University) CS 530 Spring 09 11 / 165

We define the matrix product

from above, which makes sense only for compatible matrices

From this we can infer also that matrix multiplication is

Péter Gács (Boston University) CS 530 Spring 09 12 / 165

is called their inner product: it is a scalar. The Euclidean norm

Péter Gács (Boston University) CS 530 Spring 09 13 / 165

Péter Gács (Boston University) CS 530 Spring 09 14 / 165

Péter Gács (Boston University) CS 530 Spring 09 15 / 165

Im(A) = set of image vectors of A. If the colums of matrix A are

This shows that Im(A) is generated by the column vectors of the

Péter Gács (Boston University) CS 530 Spring 09 16 / 165

Ker(A) = the set of vectors x with Ax = 0.

dim Ker(A) + dim Im(A) = dim(V ).

More generally, a non-square matrix A will be called singular, if

A special case is easy:

Interpretation: going through spaces with dimensions m → r → n.

Péter Gács (Boston University) CS 530 Spring 09 18 / 165

The following is immediate:

Péter Gács (Boston University) CS 530 Spring 09 19 / 165

Péter Gács (Boston University) CS 530 Spring 09 20 / 165

It follows from these that multiplying a permutation with a

Péter Gács (Boston University) CS 530 Spring 09 21 / 165

(−1)Inv(σ) a 1σ(1) a 2σ(2) · · · a nσ(n) .

Geometrical interpretation the absolute value of the determinant

Computing det(A) using this formula is just as inefficient as

Péter Gács (Boston University) CS 530 Spring 09 22 / 165

det(α u + βv, v2 , . . . , v n ) = α det(u, v2 , . . . , v n ) + β det(v, v2 , . . . , v n ).

Péter Gács (Boston University) CS 530 Spring 09 23 / 165

The following is also known.

Péter Gács (Boston University) CS 530 Spring 09 24 / 165