LECTURE 1: INTRODUCTION
HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY
HILARY TERM 2005, DR RAPHAEL HAUSER
1. The Central Subject of this Course. The engineer who designs an aircraft
with minimal drag given the required lift force, the manager who maximises prot
within constraints imposed by the available resources, the bicycle courier who seeks
the shortest path between two points in a city, and your cup of tea that cools down
to maximise the entropy in the universe all solve optimisation problems! The world
is full of them.
Mathematically, we can formulate an important class of such problems as follows:
(P) min
xR
n
f(x)
s.t. g
i
(x) 0, (i = 1, . . . , p),
h
j
(x) = 0, (j = 1, . . . , q),
where f, g
i
and h
j
are suciently smooth functions: typically we require them to be
twice continuously dierentiable.
The function f represents an objective (such as energy, cost etc.) that has to be
minimised under side constraints dened by the functions g
i
and h
j
. We therefore
call f the objective function, the functions g
i
the equality constraint functions, and
the functions h
j
the inequality constraint functions of (P). Note that by replacing f
by f we can of course treat maximisation problems in the same framework.
Example 1.1 (Linear Programming). The transshipment problem occurs when
the cheapest way of shipping prescribed amounts of a commodity across a transporta-
tion network has to be determined. This can be a network of oil pipe lines, a computer
network, a network of shipping lanes, a road network etc..
A network of gas pipelines is given in Figure 1.1 An arrow from node i to node j
6
5
4
3
2
1
Fig. 1.1. Gas pipeline network
represents a pipe with transport capacity c
ij
in the given direction. Transporting one
1
unit of gas along the edge (ij) costs d
ij
. The amount of gas produced at node i is p
i
,
and the amount of gas consumed is q
i
. We assume that the total amount consumed
equals the total amount of gas produced (if this assumption were not true, we could
construct an equivalent transshipment problem that has this property). How do the
quantities x
ij
of gas shipped along the edges (ij) to be chosen so as to satisfy all the
demands and to minimise costs?
We set c
ij
= 0 (and d
ij
arbitrary numbers) for all edges (ij) that do not exist.
Doing so, we can assume that the network is a complete graph. The problem we have
to solve is the following:
min
x
6
i,j=1
d
ij
x
ij
s.t.
6
k=1
x
ki
+p
i
=
6
j=1
x
ij
+q
i
, (i = 1, . . . , 6),
0 x
ij
c
ij
, (i, j = 1, . . . , 6).
This is an example of a linear programming problem, as the objective function
and all the constraint functions are linear.
Note that it is not a priori clear that this problem has feasible solutions. One is
therefore interested in algorithms that not only nd optimal LP solutions when these
exist but also detect when a problem instance is infeasible!
Example 1.2 (Quadratic Programming). In the portfolio optimisation problem,
an investor considers a xed time interval and wishes to decide which fraction of the
capital he/she wants to invest in each of n dierent given assets when the expected
return of asset i is
i
and the covariance between assets i and j is
ij
. The vector
= [
i
] and the matrix = [
ij
] are assumed to be known and the investor aims at a
total return of at least b. Subject to this constraint, he/she aims to minimise the risk
as quantied by the variance of the overall portfolio.
This problem can be modelled as
min
xR
n
n
i=1
n
j=1
ij
x
i
x
j
s.t.
n
i=1
i
x
i
b,
n
i=1
x
i
= 1,
x
i
0 (i = 1, . . . , n).
The constraint
n
i=1
x
i
= 1 expresses the requirement that 100% of the initial capital
has to be invested.
Example 1.3 (Semidenite Programming). In optimal control, variables y
1
, . . . , y
m
have to be chosen so as to design a system that is driven by the linear ODE
u = M(y)u,
2
where M(y) =
m
i=1
y
i
A
i
+A
0
is an ane combination of given symmetric matrices
A
i
(i = 0, . . . , m). To stabilise the system, one would like to choose y so as to min-
imise the largest eigenvalue of M(y).
Note that
1
(M) if and only if I M _ 0 (is positive semidenite). There-
fore, the problem we need to solve is
max
,y
s.t. I A
0
i=1
y
i
A
i
_ 0.
Example 1.4. An engineer designs a system determined by two design variables
x and y which are dependent on each other via the relation xy = 1. The energy
consumed by the system is given by E(x, y) = x
2
+y
2
4. Furthermore, the physical
properties of materials used impose the constraints x [0.5, 3]. The engineer wishes
to design a system that consumes the smallest amount of energy among all admissible
systems.
This problem can be formulated as
(P) min
x,y
x
2
+y
2
4
s.t. x 0.5 0,
x + 3 0,
x
1
y = 0.
The objective function is f(x, y) = x
2
+ y
2
4, the inequality constraint functions
are g
1
(x, y) = x 0.5 and g
2
(x, y) = x + 3, and the equality constraint function is
h(x, y) = x
1
y.
1.1. Learning Goals. The aim of this course is to teach you how to solve such
problems numerically on a computer. But rather than programming ecient code
(which is challenging in its own right), we concentrate on the theoretical properties
of prototype algorithms. Moreover, in order to derive the mathematical building
blocks of algorithms, we will have to derive mathematical conditions that characterise
the points x
(x
) around x
such
that
f(x
) f(x) x B
(x
) T,
that is, x
,
but there might be feasible points further away from x
is a global minimiser if
f(x
) f(x) x T,
that is, x
minimises the objective function amongst all feasible points of the problem,
although there might exist several of these points.
Example 2.1. The problem
(P) min
xR
f(x) = x
3
+ 9x
2
s.t. 10 x 2
4
has a local minimiser at x = 0, and a global minimiser at x
for
all feasible starting points x
0
close enough to x
) T for
some r > 0.
Example 2.2. Let us go back to the problem of Example 2.1 and consider the
following algorithm:
S0 Choose x
0
. Set = 1, k = 0.
S1 x = x
k
+f
(x
k
).
S2 If x is feasible then goto S3, else /2 and goto S1.
S3 Set x
k+1
= x, k k + 1, = 1, and goto S1.
This algorithm converges to the local minimiser x
= 10 for x
0
[10, 6). For x
0
= 6
it remains stuck. If we exclude x
0
as a starting point, then this algorithm is globally
convergent, even though it only converges to local minimisers! The focus here is that
the algorithm converges no matter what the starting point is.
On the other hand, if we omit the judicious choice of , we obtain the following
algorithm:
S0 Choose x
0
. Set k = 0.
S1 Set x
k+1
= x
k
+f
(x
k
), k k + 1, and goto S1.
5
This algorithm converges locally to the local minimiser x
= 10.
2.2. Convergence Rates. Since much of numerical analysis is devoted to the
construction of algorithms that converge quickly, we should be able to quantify con-
vergence speed.
A converging sequence (x
k
)
N
x
| |x
k
x
|
r
, k k
0
.
Note that when r = 1, only bounds with < 1 are useful. If r = 1 or r = 2 we
speak of Q-linear and Q-quadratic convergence respectively. Finally, (x
k
)
N
converges
Q-superlinearly if
lim
k
|x
k+1
x
|
|x
k
x
|
= 0.
Example 2.3. Let z (0, 1) and consider the sequence (x
k
)
N
dened by
x
k
:=
k
n=0
z
n
.
Then (x
k
)
N
converges Q-linearly but not Q-superlinearly to x
= (1 z)
1
. Indeed,
we have [x
k
x
[ =
n=k+1
z
n
. Therefore,
[x
k+1
x
[
[x
k
x
[
= z < 1,
which shows the claim.
We say that an iterative algorithm has Q-convergence rate r if every output
sequence produced by it converges at least at the Q-convergence rate r.
The practical signicance of Q-linear convergence is that asymptotically the point
x
k+1
approximates x
with log
10
more correct digits than x
k
. This means that the
number of correct digits grows linearly in the number of iterations taken. For Q-
convergence of order r on the other hand, the number of additional correct digits
asymptotically grows by a factor of r, that is, the number of correct digits is expo-
nential in the number of iterations taken and the convergence is super fast.
In practice, Q-convergence of any order r > 1 is qualitatively similar to conver-
gence of any other order because applying an order r algorithm j steps at a time
yields an order r
j
algorithm.
2.3. Convex Sets. The notion of convexity plays a central role in optimisation.
A set C R
n
is convex if
x, y C x + (1 )y C [0, 1],
that is, if the straight line segment joining any two elements of C lies in C. The
empty set, half spaces x : a
T
x 0, polyhedra x : Ax b, open balls B
( x) =
6
x : |x x| < , ellipsoids x : x
T
Bx r (with B a positive denite matrix) and
ane subspaces x : a
T
x = b are all examples of convex sets.
If C, D R
n
are convex sets, R and : R
n
R
m
is a linear map, then
C + D := x + y : x C, y D, C := x : x C, (C) := (x) : x C and
C D are convex sets.
2.4. Convex Functions. Functions f : R
n
(, +] into the real line
extended by + are called proper. A proper function is convex if its epigraph
epi(f) :=
(x, z) R
n+1
: f(x) z
is a convex set in R
n+1
.
A proper function f (assumed to be dened on all of R
n
) is convex if and only if
f
x + (1 )y
dom(f)
) = 0 then x
is a global minimiser of f. If D = R
n
then this condition is both sucient and necessary.
(iv) f is both convex and concave if and only if f is an ane function.
Proof. Suppose x
) <
7
f(x
x + (y x)
f(x) +
f(y) f(x)
, and
hence
f
x +(y x)
f(x)
f(y) f(x)
Taking limits as 0 we get (2.2). This proves (ii). (iii) is a trivial consequence
of (i) and (ii). If f is ane, then it is clearly both convex and concave. On the
other hand, if f is both convex and concave, and if f is dierentiable at least at
one point x
) + f(x
) (y x
) and
f(y) f(x
) f(x
) (y x
) +f(x
) (y x
).
The general case can be proved in a similar way using the notion of subdierential.
One can also prove that there are always points where f is dierentiable, but this is
technically more dicult.
Theorem 2.5 (Second order dierential properties of convex functions).
Let f : D R be a function dened on a convex open domain D R
n
.
(i) If f is convex, x D and the Hessian H(x) = f