【书】nonlinear optimization （SC function）

UNIVERSITY OF WATERLOO
Nonlinear Optimization
E. de Klerk, C. Roos, and T. Terlaky
Waterloo, February 22, 2006
Contents
6 Self-concordant functions 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 7 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Epigraphs and closed convex functions . . . . . . . . . . . . . . Denition of the self-concordance property . . . . . . . . . . . . Equivalent formulations of the self-concordance property . . . . Positive deniteness of the Hessian matrix . . . . . . . . . . . . Some basic inequalities . . . . . . . . . . . . . . . . . . . . . . . Quadratic convergence of Newtons method . . . . . . . . . . . . Algorithm with full Newton steps . . . . . . . . . . . . . . . . . Linear convergence of the damped Newton method . . . . . . . Further estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 8 12 14 16 18 20 25 31 31 32 40 42 43 45 47 50 53 53 54 57
Minimization of a linear function over a closed convex domain 7.1 7.2 7.3 7.4 7.5 7.6 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eect of a -update . . . . . . . . . . . . . . . . . . . . . . . . . Estimate of c x c x 7.4.1 7.5.1
T T
. . . . . . . . . . . . . . . . . . . . . .
Algorithm with full Newton steps . . . . . . . . . . . . . . . . . Analysis of the algorithm with full Newton steps . . . . . . . . . . . . . . . . . . Analysis of the algorithm with damped Newton steps . . . . . . . . . . . . . . . . . . . . . . Algorithm with damped Newton steps Adding equality constraints
Solving convex optimization problems 8.1 8.2 8.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Getting a self-concordant barrier for F . . . . . . . . . . . . . . Tools for proving self-concordancy . . . . . . . . . . . . . . . . . i
8.4 8.5
Application to the functions in the table of Figure 8.1 . . . . . . Application to other convex problems . . . . . . . . . . . . . . . 8.5.1 8.5.2 8.5.3 8.5.4 Entropy minimization . . . . . . . . . . . . . . . . . . . Extended entropy minimization . . . . . . . . . . . . . . p -norm optimization . . . . . . . . . . . . . . . . . . . . Geometric optimization . . . . . . . . . . . . . . . . . . .
63 66 67 67 68 69 73 73 74 77 78 79 82 84 89 90 90 91 95
Conic optimization 9.1 9.2 9.3 9.4 9.5 9.6 9.7 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Every optimization problem can be modelled as a conic problem Solution method . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction to inequality system . . . . . . . . . . . . . . . . . . . Interior-point condition . . . . . . . . . . . . . . . . . . . . . . . Embedding into a self-dual problem . . . . . . . . . . . . . . . . Self-concordant barrier function for (SP ) . . . . . . . . . . . . .
10
Symmetric Optimization 10.1 Self-dual problems over the standard cones . . . . . . . . . . . . 10.1.1 On the structure of the matrix M . . . . . . . . . . . . . 10.1.2 Linear cone . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 Second-order cone . . . . . . . . . . . . . . . . . . . . . . 10.2 10.3 10.4 10.5
10.1.4 Semidenite cone . . . . . . . . . . . . . . . . . . . . . . 102 The general self-dual symmetric case . . . . . . . . . . . . . . . 114 Back to the general symmetric case . . . . . . . . . . . . . . . . 121 What if = 0? . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 10.5.1 The linear case . . . . . . . . . . . . . . . . . . . . . . . 133 10.5.2 The quadratic case . . . . . . . . . . . . . . . . . . . . . 135 10.5.3 The semidenite case . . . . . . . . . . . . . . . . . . . . 142 A Some technical lemmas 147 153
Bibliography
ii
iii
Chapter 6
Self-concordant functions
Thanks
Thanks are due to Gouyong Gu, Bib Silalahi, Maryam Zangibadi, Stefan Zwam (students 3TU course Optimization), Gamal Elabwabi (proof of Lemma 10.31) for some textual and conceptual remarks on earlier versions of the following chapters.
6.1
Introduction
In this chapter we introduce the notion of a self-concordant function and we derive some properties of such functions. We consider a strictly convex function : D R, where the domain D is an open convex subset of Rn . Our rst aim is to nd the minimal value on its domain D (if it exists).
The classical convergence analysis of Newtons method for minimizing has some major shortcomings. The rst shortcoming is that the analysis uses quantities that are not a priori known, for example uniform lower and upper bounds for the eigenvalues of the Hessian matrix of on D. The second shortcoming is that while Newtons method is ane invariant, these quantities are not ane invariant. As a result, if we change coordinates by an ane transformation this has in essence no eect on the behavior of Newtons method but these quantities all change, and as a result also the iteration bound changes.
A simple and elegant way to avoid these shortcomings was proposed by Nesterov and Nemirovski [8]. They posed an ane invariant condition on the function , named self-concordancy. The well known logarithmic barrier functions, that play an important role in interior-point methods for linear and convex optimization, are self-concordant (abbreviated below as self-concordant). The analysis of Newtons method for self-concordant functions does not depend on any unknown constants. As a consequence the iteration bound resulting form this analysis is invariant under (ane) changes of coordinates. 1
The aim of this section to provide a brief introduction to the notion of selfconcordancy, and to recall some results on the behavior of Newtons method when minimizing a self-concordant function. Having dealt with this we will consider in the next chapter the problem of minimizing a linear function over the closure of D, while assuming that a self-concordant function on D is given. In two subsequent chapters we apply these results to general convex optimization problems and conic optimization problems, respectively. Although we deviate from it at several places, the treatment below is based mainly on [1], [3], [5] and [7].
6.2
Epigraphs and closed convex functions
We did not deal with the notion of a closed convex function so far. We therefore start with some denitions. In this section and further on, always denotes a function whose domain D is an open convex subset of Rn . Recall from Denition ?? that the epigraph of is the set epi := {(x, t) : x D, (x) t} , and that by Exercise ?? the function : D R is convex if and only if epi is a convex set. Denition 6.1. A function : D R is called closed if its epigraph is closed. If, moreover, is convex then is called a closed convex function. We denote the closure of D as D. Then the boundary of D is dened as the set \ D. D Lemma 6.2. Let : D R be a closed convex function and let x belong to the boundary of D. If a sequence {xk }k=0 in the domain converges to x then (xk ) . Proof. Consider the sequence { (xk )}k=0 . Assume that it is bounded above. Then it has a limit point . Of course, we can think that this is the unique limit point of the sequence. Therefore, zk := (xk , (xk )) x, . Note that zk belongs to the epigraph of . Since is a closed function, then also x, belongs to the epigraph. But this is a contradiction since x does not belong to the domain of . 2
We conclude that if the function is closed convex, then it has the property that (x) approaches innity when x approaches the boundary of the domain D. This 2
is also expressed by saying that is a barrier function on D. In fact, the following exercise makes clear that the barrier property is equivalent to the closedness property. Exercise 6.1. Let the function : D R have the property that it becomes unbounded (+) when approaching the boundary of its open domain D. Then is closed. Prove this.
Solution: 2
6.3
Denition of the self-concordance property
We want to minimize : D R by using Newtons method. Recall that Newtons method is exact if is a quadratic function. As we will see the self-concordancy property guarantees good behavior of Newtons method. To start with we consider the case where is a univariate function. So we assume for the moment that n = 1, and that the domain D of the convex function : D R is just an open interval in R. The third order Taylor polynomial of around x D is given by 1 1 T3 () = (x) + (x) + 2 (x) + 3 (x). 2 6 The self-concordance property bounds the third order term in terms of the second order term, by requiring that (x)3
2
3 ( (x)2 )
( (x)) ( (x))
x D,
is bounded above by some (uniform) constant. According to the following denition this constant is given by 42 . Denition 6.3. Let 0. The univariate function is called -self-concordant if | (x)| 2 ( (x)) 2 ,
3
x D.
(6.1)
Note that this denition assumes that (x) is nonnegative, whence is convex, and moreover that is three times dierentiable. It is easy to verify that the property (6.1) is ane invariant. Because, let be -self-concordant and let be dened by (y) = (ay + b), where a = 0. Then one has (y) = a (x), (y) = a2 (x), (y) = a3 (x), 3
where x = ay + b. Hence it follows, due to the exponent is -self-concordant as well.
3 2
in the denition, that
Now suppose that n > 1, so is a multivariate function. Then is called a -self-concordant function if its restriction to an arbitrary line in its domain is -self-concordant. In other words, we have the following denition. Denition 6.4. Let 0. The function is called -self-concordant if and only if x,h () := (x + h) is a -self-concordant function of , for all x D and for all h Rn . Here the domain of x,h () is dened in the natural way: for given x D and h Rn it consists of all such that x + h D. Note that since D is an open convex subset of Rn , the domain of () is an open interval in R. We proceed by presenting some simple examples of self-concordant functions. In what follows, when it gives no rise to confusion, we will denote the function x,h simply as .
Example 6.5 [Linear function] Let (x) = + aT x, with R and a Rm . Then, for any x Rm and h Rm we have () = (x + h) = + aT (x + h) = (x) + aT h. So () is linear in , and hence () = () = 0.
This implies that is 0-self-concordant. Since this holds for any x Rm and h Rm it follows that is 0-self-concordant. Example 6.6 [Convex quadratic function] Let (x) = + aT x + 1 T x Ax, 2
with and a as before and A = AT positive semidenite. Then, for any x Rm and h Rm , () = (x + h) = + aT (x + h) + = (x) + aT x + hT Ax + Hence () = hT Ah 0, 1 (x + h)T A (x + h) 2
1 2 T h Ah. 2 () = 0.
Thus it follows that is is 0-self-concordant, and since this holds for any x Rm and h Rm , also is 0-self-concordant.
We may conclude from the above two examples that linear and convex quadratic functions are 0-self-concordant. This is an obvious consequence of the fact that in these cases the third derivatives are zero.
Example 6.7 Consider the convex function (x) = x4 , with x R. Then (x) = 4x3 , (x) = 12x2 , (x) = 24x
Now we have
( (x))2 ( (x))
3
(24x)2 (12x2 )3
1 . 3x4
Clearly the right hand side expression is not bounded if x 0, hence in this case (x) is not self-concordant.
Exercise 6.2. Let k be an integer and k > 1. Prove that (x) = xk , where x R, is -self-concordant for some only if k 2.
Solution: 2
Example 6.8 Now consider the (univariate) convex function (x) = x4 log x, Then (x) = 4x3 Therefore, ( (x))2 ( (x))
3
x > 0. 1 , x2
1 , x
2
(x) = 12x2 +
(x) = 24x
2
2 . x3
24x 12x2 +
2 x3 1 x2
(12x4 + 1)
2 24x4 2
3
24x4 + 2
(12x4 + 1)3
4 4. 12x4 + 1
This proves that (x) is a 1-self-concordant function.
Exercise 6.3. Let k be an integer and k 2. Verify for which values of k the function (x) = xk log x, with x > 0, is self-concordant, and in any such case nd the best value for .
Solution: We have (x) = kxk1 1 x 1 x2 2 . x3
2
(x) = k(k 1)xk2 +
(x) = k(k 1)(k 2)xk3 Hence

2 2 k(k 1)(k 2)xk3 x3 ( (x)) 3 3 = 1 ( (x)) k(k 1)xk2 + x2
k(k 1)(k 2)xk 2 (k(k 1)xk + 1)3
and we need to nd the maximal value of this expression for x > 0. We have
42 = max
x>0
k(k 1)(k 2)xk 2 (k(k 1)xk + 1)

3
= max
y>0
(k(k 1)(k 2)y 2)2 (k(k 1)y + 1)3
If k = 0 or k = 1 we get 42 = 4, whence = 1. If k = 2 we also get = 1. Thus we further assume that k > 2. Putting z = k(k 1)y and p = k 2, we get 42 = max g(z),
z>0
g(z) =
(p z 2)2 . (z + 1)3
Now g (z) = and g (z) = 0 if z g 2(p + 3) p
2 2(p+3) , p p
12 p2 (z 2)z + p(8z 4) (1 + z)4

2 p
. One has g =
= 0 and = (2k)2
3k k2 3
(2p + 4)2 1+
2(3+p) p 3
(2k)2 1+
2(k+1) k2 3
4 (k 2)3 . 27 k
Thus we obtain that if k > 2 then
= this value occurs for y =

2(k+1) , k(k1)(k2)
(k 2)3 ; 27k
which means that
x=
2(k + 1) . k(k 1)(k 2)
Example 6.9 Let with 0 < x R. Then (x) = and ( (x))2 ( (x))3 Hence, is 1-self-concordant. Example 6.10 Let with 1 < x R. Then (x) = (x) = x log(1 + x), x , 1+x (x) = 1 (1 + x)2 , (x) = 2 , (6.2) =
2 x3 1 x2
(x) = log x, 1 , x (x) = 1 , x2

2 3
(x) =
2 , x3
= 4.
(1 + x)3
and it easily follows that is 1-self-concordant.
Exercise 6.4. Prove that the functions x log x and x log xx are not self-concordant on their domain. 6
Solution: Dening (x) = x log x x one has (x) = log x, If follows that (x) =
2
1 , x
1
(x) =
1 . x2
( (x)) 1 4 = x = , 1 x ( (x))3 x3 which is not bounded above for x > 0. Hence (x) is not self-concordant. Since x log x has the same second and third derivative as (x) it is also not self-concordant.
Exercise 6.5. Consider the so-called logarithmic barrier function of the entropy function x log x, which is given by (x) := x log x log x = (x 1) log x, Show that (x) is 1-self-concordant.
Solution: One has (x) = x1 + log x, x
2
0 < x R.
(x) =
x+1 , x2
(x) =
x+2 . x3
Hence, using also x > 0 we may write,

2 x+2 (2x + 2)2 (x + 2)2 ( (x)) 4 x3 = 4, 3 = x+1 3 = 3 x+1 ( (x)) (x + 1) (x + 1)3 x2
showing that is 1-self-concordant.
Exercise 6.6. If is -self-concordant function with > 0, then can be rescaled by a positive scalar so that it becomes 1-self-concordant. This follows because if is some positive constant then is -self-concordant. Prove this.
Solution: Due to Denition 6.4 it suces to deal with the case that is a univariate function. Supposing that is -self-concordant we have
3 (x) 2 (x) 2 ,
x D.
Now let be a positive scaler and (x) = (x) for each x D. Then (x) = (x), (x) = (x).
3
2
Hence
3 (x) = (x) 2 (x) 2 = 2
(x)
=2
(x)
3
2
for each x D, proving that is
-self-concordant.
6.4
Equivalent formulations of the self-concordance property
The aim of this section is to present some other characterizations of the property of self-concordance. We start by introducing some new notations. As before, we assume that : D R, where D is an open convex subset of Rn , and, for any x D and h Rn we use the univariate function x,h () := (x + h) where runs through all values such that x + h D. The rst three derivatives of x,h () with respect to are given by
n
() = x,h
i=1 n
hi
n
(x + h) xi hi hj 2 (x + h) xi xj 3 (x + h) . xi xj xk
(6.3) (6.4) (6.5)
() = x,h () = x,h
i=1 j=1 n n n
hi hj hk
i=1 j=1 k=1
The formulas (6.3) and (6.4) are immediately clear from Exercise ??. The third expression is left as an exercise. Exercise 6.7. Prove formula (6.5).
Solution: By (6.4) (see also Exercise ??) we have
n n
() = hT 2 (x + h)h = x,h Since d 2 (x + h) = d xi xj we obtain

n n n
hi hj
i=1 j=1
2 (x + h) . xi xj
hk
k=1
2 (x + h) = xk xi xj
n
hk
k=1
3 (x + h) xi xj xk
() = x,h
i=1 j=1
hi hj
k=1
hk
3 (x + h) . xi xj xk 2
which is in agreement with (6.5).
It will become clear soon that to verify if is self-concordant we need to know the values of the rst three derivatives of x,h () = (x + h) at = 0. These immediately follow from (6.3)(6.5). To simplify notation we use sort-hand notations for these values, and denote them respectively as (x)[h], 2 (x)[h, h] and 8
3 (x)[h, h, h] respectively. Thus we may write (0) x,h (0) x,h (0) x,h = 2 (x)[h, h] = (x)[h] = hT (x) = hT 2 (x)h
(6.6)
= 3 (x)[h, h, h]
= hT 3 (x)[h]h.
We now come to the main result of this section [3, Theorem 2.1]. Theorem 6.11. Let be three times continuously dierentiable and 0. Then the following three conditions are equivalent. 3 (x)[h, h, h] 2 2 (x)[h, h] () x,h (0) 2 (0) , x,h x,h
3 2 () 2 , x,h 3 2 3 2
x D, h Rn
n
(6.7) (6.8) (6.9)
x D, h R , dom x,h x D, h Rn .
Proof. The equivalence of (6.7) and (6.9) is a direct consequence of (6.6). Obviously (6.8) implies (6.9), because 0 dom x,h . On the other hand, by replacing x by x + h in (6.9) we obtain (6.8). Hence the proof is complete. 2
Remark: In the literature most authors use instead of (6.7) the apparently stronger condition 3 3 (x)[h, h, h] 2 2 (x)[h, h] 2 , x D, h Rn . (6.10) But this is not needed since the left-hand side in (6.7) changes sign when replacing h by h, whereas the right-hand side does not change. Thus it becomes clear that (6.10) and (6.7) are equivalent.
Note that (6.8) just states that is -self-concordant, by Denition 6.4. We will say that is self-concordant, without specifying , if is -self-concordant for some 0. In general (6.9) make its much simpler to prove that a function is self-concordant. By Theorem 6.11 this will be the case if and only if the quotient 3 (x)[h, h, h] (2 (x)[h, h])
2
(6.11)
The -self-concordancy condition bounds the third order term in terms of the second order term in the Taylor expansion. Hence, if it is satised, it makes that the second order Taylor expansion locally provides a good quadratic approximation 9
is bounded above by 42 when x runs through the domain of and h through all vectors in Rn . Note that the condition for -self-concordancy is homogeneous in h: if it holds for some h then it holds for any h, with R.
of (x). The latter property makes that Newtons method behaves well on selfconcordant functions. This will be shown later on. Recall that the denition of the -self-concordance property applies to every three times dierentiable convex function with an open domain. Keeping this in mind we can already give two more examples of self-concordant multivariate functions.
Example 6.12 By way of example consider the multivariate function
n
(x) := (x) 1 = , xi xi
log xi ,
i=1
with 0 < x Rn . Then, with e denoting the all-one vector, 1 2 (x) = 2, x2 xi i
3 (x) 2 = 3, x3 xi i
n
and all other second and third order partial derivatives are zero. Hence we have for any h Rn
n
2 (x)[h, h] = Hence, putting i :=

hi xi 2
i=1
h2 i , x2 i
3 (x)[h, h, h] =
i=1
2h3 i . x3 i

we get
n
(x)[h, h] = Applying the inequality

n i=1
i ,
i=1
3 (x)[h, h, h] = 2
n
n i=1
3 i .
3 i
i=1
|i |
2 i
3 2
(6.12)
i=1
we obtain
thus proving that is 1-self-concordant.
3 3 (x)[h, h, h] 2 2 (x)[h, h] 2 ,
Example 6.13 With as dened in Example 6.9 we now consider

n
(x) :=
i=1
(xi ),
with e < x Rn . Letting h Rn , we prove that () := (x + h) is 1-self-concordant. This goes as follows. One has
n
() =
i=1
(xi + hi ).
Hence
n n n
() =
i=1
hi (xi + hi ),
() =
i=1
h2 (xi + hi ), i
() =
i=1
h3 (xi + hi ). i
So we have, also using (6.2),

n n
(0) =
i=1
h2 (xi ) = i
i=1
h2 i (1 + xi )
n 2
(0) =
i=1
h3 (xi ) = i
i=1
(1 + xi )3
2h3 i
Putting i := hi /(1 + xi ) we thus have (0) =

i=1
i ,
(0) = 2
n i=1
3 i .
It remains to show that | (0)| complete.
3 2 ( (0)) 2
. But this follows from (6.12). Hence the proof is
10
In what follows we use the following notations: g(x) := (x), x D and H(x) := 2 (x), x D. As we will see in the next section, under a very weak assumption the matrix H(x) is always positive denite. As a consequence it denes an inner product, according to v, w x := v T H(x)w, v, w Rn . (6.13) The induced norm is denoted as v
x x.
So we have v Rn . (6.14)
:=
v T H(x)v,
Of course, this norm depends on x D. We call it the local Hessian norm of v at x D. Using this notation, the inequality (6.7) can be written as 3 (x)[h, h, h] 2 h
3 x
We conclude this section with the following characterization of the self-concordance property. Lemma 6.14. A three times dierentiable closed convex function with open domain D is -self-concordant if and only if 3 (x)[h1 , h2 , h3 ] 2 h1 holds for any x D and all h1 , h2 , h3 Rn . Proof. This statement follows from a general property of three-linear forms. For the proof we refer to Lemma A.2 in the Appendix. 2
x
h2
h3
We conclude this section with one more characterization of self-concordance, leaving the proof to the reader. Exercise 6.8. Given a three times dierentiable convex function as before, and with x,h dened as usual, one has that is -self-concordant if and only if 1 d , x D, h Rn , dom x,h . (6.15) d ()
x,h
Prove this. 11
Solution: One has
d 1 d () x,h
() x,h 2 () x,h
3 2
. 2
Hence, (6.15) is equivalent to (6.8), which completes the proof.
6.5
Positive deniteness of the Hessian matrix
In this section we deal with an interesting, and important consequence of Lemma 6.2. Before dealing with it we introduce a useful function. Let x D and 0 = d Rn be such that x + d D. Fixing v, we dene for 0 1, q() := v T H(x + d)v = v
2 x+d
(6.16)
Then q() is nonnegative and continuous dierentiable. The derivative to is given by q () = v T 3 (x + d)[d] v = 3 (x + d)[d, v, v]. Using Lemma 6.14 we obtain |q ()| = 3 (x + d)[d, v, v] 2 d If q() > 0 this implies |q ()| d log q() q () 2 d = = d q() q() In the special case where v = d we have d
1
x+d
2 x+d
= 2 d
x+d
q().
x+d
(6.17)
x+d
= q() 2 , and hence we then have

3
|q ()| 2 q() 2 . If q() > 0 this implies d d 1 q() = q () 2q() 2

3
(6.18)
(6.19)
Theorem 6.15. Let the closed convex function with open domain D be -selfconcordant. If D does not contain a straight line then the Hessian 2 (x) is positive denite at any x D. Proof. Suppose that H(x) is not positive denite for some x D. Then there exists a nonzero vector d Rn such that dT H(x)d = 0 or, equivalently, d x = 0. 2 Let q() := d x+d , just as in (6.16) with v = d. Then q(0) = 0 and q() is 3 nonnegative and continuously dierentiable. Now (6.18) gives q () 2q() 2 . 12
We claim that this implies q() = 0 for every 0 such that x + d D. This is a consequence of the following claim. Claim: Let I = [0, a) for some a > 0 and q : I R+ . If q(0) = 0 and q () 3 2q() 2 for every I then q() = 0 for every I. Proof.
1
Assume q(1 ) > 0 for some 1 I. Let 0 := min { : q() > 0, (, 1 ]} .
Since q is continuous and q(0) = 0, we have 0 0 < 1 and q(0 ) = 0. Now dene h(t) := 1 q(1 t) , t [0, 1 0 ).
Then, since 1 t (0 , 1 ], the denition of 0 implies that h(t) is well dened and positive. Note that h(t) goes to if t approaches 1 0 . On the other hand we have 3 1 2q(1 t) 2 1 q (1 t) = , h (t) = 3 3 2 q(1 t) 2 2 q(1 t) 2 and hence h(t) h(0) + t for all t [0, 1 0 ). Since h(0) + t remains bounded when t approaches 1 0 we have a contradiction. Hence the claim is proved. 2 Thus we have shown that q() = 0 for every 0 such that x + d D. This implies that (x + d) is linear in , because we have for some , 0 , 1 (x + d) = (x) + dT g(x) + 2 q() = (x) + dT g(x). 2 Since D does not contain a straight line there exists an such that x + d belongs to the boundary of D. We may assume that > 0 (else replace d by d). Since lim (x + d) = (x) + dT g(x), which is nite, this gives a conict with the barrier property of on D. Thus the proof is compete. 2
Corollary 6.16. If is closed and self-concordant, and D does not contain a line, then (x) has a unique minimizer. From now on it will be assumed that the hypothesis of Theorem 6.15 is satised. So the domain D does not contain a straight line. As a consequence we have x D, h Rn :
1 This
= 0 h = 0.
proof is due to Ir. P. Sonneveld and Dr. A. Almendral.
13
6.6
Some basic inequalities
From now on we assume that is strictly convex. By Theorem 6.15 this is the case if is closed and self-concordant, and D does not contain a line. The Newton step at x is given by x = H(x)1 g(x). (6.20) Suppose that x is a minimizer of (x) on D. A basic question is how we can measure the distance from x to x ? One obvious measure for the distance is the Euclidean norm x x . But x is unknown! So this measure can not be computed without knowing the minimizer. Therefore we might use instead the Euclidean norm of x, i.e., x , which vanishes only if x = x . However, instead of the Euclidean norm we use the local Hessian norm x x of x at x, as introduced in Section 6.4, to measure the distance from x to x . In what follows we denote this quantity by (x). Thus we have (x) := x
x
xT H(x)x =
g(x)T H(x)1 g(x).
(6.21)
Exercise 6.9. If the full Newton step at x D is feasible, i.e., if x + x D, then we have (x + x) (x) (x)2 . Prove this.
Solution: Since is convex, we have (x + x) (x) + xT g(x). Using (6.20) and (6.21) we get xT g(x) = g(x)T H(x)1 g(x) = (x)2 . 2
Lemma 6.17. Let x D and R+ and d Rn such that x + d D. Then d x 1 + d

x
x+d
d x 1 d
x
;
x
the left inequality holds for all such that 1 + d such that 1 d x > 0. Proof. Let q() := d
2 x+d ,
> 0 and the right for all
just as in (6.16) with v = d. Then, from (6.19),

1
q () dq() 2 . = 3 d 2q() 2 Consequently, if x + d D then q(0) 2 q() 2 q(0) 2 + . 14

1 1 1
Since q(0) 2 = d
and q() 2 = d
x+d ,
this gives
1 1 1 + , d x d x+d d x or, equivalently, 1 d d x Hence, if 1 + d

x x
1 d
x+d
1 + d d x
> 0 we obtain d x 1 + d d
x+d
and if 1 d
> 0 we obtain d
x+d
d x 1 d
,
x
proving the lemma.
Exercise 6.10. With h Rn xed, dene () := Then () := h
1
x+h
22 (x + h)[h, h] 2
3 (x + h)[h, h, h]
3
,
2
and hence | ()| . Derive Lemma 6.17 from this.

Solution:
Lemma 6.18. Let x and d be such that x D, x + d D and d we have, for any nonzero v Rn , (1 d
2 x)
< 1. Then
x+d
v x 1 d
.
x
(6.22)
Proof. Let q() := v x+d , just as in (6.16). Then q(0) = v 2 v x+d . Hence we may write log v v
x+d x
2 x
and q(1) =
1 q(1) 1 1 log = (log q(1) log q(0)) = 2 q(0) 2 2 15
1 0
d log q() d
d.
By (6.17) we have log and log v v

x+d x 1
d log q() d
2 d
x+d
. Also using Lemma 6.17 this implies

1 x ) |=0
d x 1 d v v
x+d x
d = log (1 d
1
= log
1 1 d
Since the log function is monotonically increasing, we obtain from the above inequalities that v x+d 1 1 d x . v x 1 d x This proves the lemma. 2
d x 1 d
d = log (1 d
x) .
Exercise 6.11. If x and d are such that x D, x + d D and d (1 d

2 x)
< 1, then
H(x)
H(x + d)
H(x) (1 d
x)
2,
Derive this from Lemma 6.18.

Solution:
Lemma 6.19. Let x D and d Rm . If d
<
then x + d D.
1 Proof. Since d x < , we have from Lemma 6.18 that H(x + d) is bounded for all 0 1, and thus (x + d) is bounded. On the other hand, takes innite values on the boundary of the feasible set, by Lemma 6.2. As a consequence we must have x + d D. 2
6.7
Quadratic convergence of Newtons method
In Example ?? we already have seen an example where Newtons method converges quadratically fast if the method starts in the neighborhood of the minimizer. In this section we show that in case the function that is minimized is self-concordant this behavior of Newtons method can be quantied nicely. More precisely, we can specify accurately a region around the minimizer where Newtons method is quadratically convergent by means of our proximity measure (x). 16
Let x+ := x + x denote the iterate after the Newton step at x. Recall that the Newton step at x is given by x = H(x)1 g(x)
where H(x) and g(x) are the Hessian matrix and the gradient of (x), respectively.
Recall from (6.21) that we measure the distance from x to the minimizer x of (x) by the quantity (x) = x
x
g(x)T H(x)1 g(x).
Note that if x = x then g(x) = 0 and hence (x) = 0; whereas in all other cases (x) will be positive. The Newton step at x+ is denoted as x+ . After the Newton step we have (x+ ) = x+
x+
= H(x+ )1 g(x+ )
x+
g(x+ )T H(x+ )1 g(x+ ).
We are now are ready to prove our rst main result on Newtonss behavior on self-concordant functions. Theorem 6.20. If (x)
1
then x+ is feasible. Moreover, if (x) < (x) 1 (x)

2
then
(x+ )
Proof. The feasibility of x+ follows from Lemma 6.19, since x
To prove the second statement in the theorem we denote the Newton step at x+ shortly as v. So v := H(x+ )1 g(x+ ). For 0 1 we consider the function k() := v T g(x + x) (1 )v T g(x). Note that k(0) = 0 and k(1) = g(x+ )T H(x+ )1 g(x+ ) = (x+ )2 . Taking the derivative of k to we get, also using H(x)x = g(x), k () = v T H(x + x)x + v T g(x) = v T (H(x + x) H(x)) x. By Exercise 6.11, H(x + x) H(x) 1 (1 x 17
x) 2
= (x) 1/.
1 H(x).
Now applying the generalized Cauchy inequality in the Appendix (Lemma A.1) we get v T (H(x + x) H(x)) x 1 (1 x
x 2 x)
Hence, combining the above results, and using x k () Therefore, since k(0) = 0,
1
= (x), we may write (x).
1 (1 (x))
2
k(1) (x) v
1 (1 (x))
2
1 d = v
(x)2 . 1 (x)
Since v = H(x+ )1 g(x+ ), we have, by Lemma 6.18, v

x
v x+ 1 x
=
x
(x+ ) . 1 (x)
Since k(1) = (x+ )2 , it follows by substitution, (x+ )2 = k(1) (x)2 (x+ ) . 1 (x) 1 (x) 2
Dividing both sides by (x+ ) the lemma follows.
Corollary 6.21. If (x) (x). Corollary 6.22. If (x) 2 3 2 (x) .
1 2
3
1 3
5 0.3820 then x+ is feasible and (x+ )

2 3 2 (x)
then x+ is feasible and (x+ )
6.8
Algorithm with full Newton steps
1 Assuming that we know a point x D with (x) 3 we can easily obtain a point x D such that (x) , for prescribed > 0, with the algorithm in Figure 6.1. In this algorithm we use the notations introduced before. So x denotes the Newton step with respect to at x, given by (6.20), and (x) is the value of the proximity measure at x as given by (6.21). We assume that is not linear or quadratic. Then > 0. Due to Exercise 6.6 we may always assume that 4 . We will assume this 9 from now on.
18
Input: An accuracy parameter (0, 1); x D such that (x) while (x) do x := x + x endwhile
1 3 .
Figure 6.1. Algorithm with full Newton steps The following theorem gives an upper bound for the number of iterations required by the algorithm. Theorem 6.23. Let x D and (x) steps requires at most
1 3 .
Then the algorithm with full Newton 1
log2 3.4761 log
iterations. The output is a point x D such that (x) .

1 Proof. Let x0 D be such that (x0 ) 3 Starting at x0 we repeatedly apply full Newton steps until the k-iterate, denoted as xk , satises (xk ) , where > 0 is the prescribed accuracy parameter. We can estimate the required number of Newton steps by using Corollary 6.22. To simplify notation we dene for the 3 moment 0 = (x0 ) and = 2 . Note that 1. It then follows that
(xk ) (xk1 ) This gives (xk ) 2
(xk2 )
2
2 2
2+4++2
2k
2k
k+1
2k 1 3
= 2 2 0 we obtain
2 0
2k
Using the denition of and 0
2 0
3 2
k 3 2 4
1 3 = . 3 4
Hence, we certainly have (xk ) if this reduces to 2k log
. Taking logarithms at both sides
3 4
log .
19
log log2 log 3 4
Dividing by log 3 (which is negative!) we get 2k 4 . Thus we nd that after no more than log2 log log 3 4
log 3 log 4
, or, equivalently, k 1
= log2 (3.4761 log ) = log2 3.4761 log
iterations the process will stop and the output will be an x D such that (x) . 2
6.9
Linear convergence of the damped Newton method
In this section we consider the case where x D lies outside the region where the Newton process is quadratically convergent. More precisely, we assume that 1 (x) > 3 . In that case we perform a damped Newton step, with damping factor , and the new iterate is given by x+ = x + x. In the algorithm of Figure 6.2 we use = 1/(1 + (x)) as a default step size.
Input: x D such that (x) > while (x) > :=

1 1+(x) 1 3 1 3 .
do
x := x + x endwhile
Figure 6.2. Algorithm with damped Newton steps In the next theorem we use the function (t) := t log(1 + t), t > 1. (6.23)
Note that this is a strictly convex nonnegative function, which is minimal at t = 0, and (0) = 0. The lemma shows that with an appropriate choice of we can guarantee a xed decrease in after the step. 20
Theorem 6.24. Let x D and := (x). If := (x) (x + x) Proof. Dene
1 1+
then
() . 2
() := (x) (x + x). Then () = g(x + x)T x () = xT H(x + x)x = 2 (x + x)[x, x]
() = 3 (x + x)[x, x, x]. () 2 x Hence, also using Lemma 6.17, () 2 x

3 x x) 3
Now using that is -self-concordant we deduce from the last expression that
3 x+x
(1 x
(1 )
23
3.
This information on the third derivative of () is used to prove the theorem, by integrating three times. By integrating once we obtain
() (0)
(1 )
23
3 d =
(1 ) 2
2 =0
(1 )
+ 2 .
Since (0) = 2 (x)[x, x] = 2 , we obtain () (1 )

2.
By integrating once more we derive an estimate for ():
() (0)
(1 )
d =
(1 )
=
=0
+ . (1 )
Since (0) = g(x)T x = xH(x)x = 2 , we obtain () + + 2 . (1 )
Finally, in the same way we derive an estimate for (). Using that (0) = 0 we have
()
1 + + 2 d = 2 log (1 ) + + 2 2 . (1 ) 21
One may easily verify that the last expression is maximal for a = of this value yields (a) 1 2 log 1 1 + + =
1 1+ .
Substitution
1 1 ( log (1 + )) = 2 () , 2 2
which is the desired inequality.
Since (t) is monotonically increasing for positive t, and > 1/(3), the following result is an immediate consequence of Theorem 6.24. Corollary 6.25. If (x) >
1 3
then x+ is feasible and 1 2 1 3 = 0.0457 1 > . 2 222
()
The next result is an obvious consequence of this corollary.

1 Theorem 6.26. Let x D and (x) > 3 . If x denotes the minimizer of (x), then the algorithm with damped Newton steps requires at most
222 (x0 ) (x ) iterations. The output is a point x D such that (x)

1 3 .
In order to obtain a solution such that (x) , after the algorithm with damped Newton steps we can proceed with full Newton steps. Due to Theorem 6.23 and Theorem 6.24 we can obtain such a solution after a total of at most 222 (x0 ) (x ) +
2
log 3.4761 log
(6.24)
iterations. Note the drawback of the above iteration bound: usually we have no a priori knowledge of (x ) and the bound cannot be calculated at the start of the algorithm. But in many cases we can derive a good estimate for (x0 ) (x ) and we obtain an upper bound for the number of iterations before starting the optimization process.
Example 6.27 Consider the function : (1, ) R dened by (x) = x log(1 + x), x , 1+x x > 0. We established earlier that is 1-self-concordant, in Example 6.10. One has (x) = (x) = 1 (1 + x)2 .
22
Therefore, (x) =
(x)2 = x2 = |x| . (x) Note that x = 0 is the unique minimizer. The Newton step at x is given by (x) x = = x(1 + x), (x) and a full Newton step yields x+ = x x(1 + x) = x2 . The Newton step is feasible only if x2 > 1, i.e. only if (x) < 1. Note that the theory guarantees feasibility in that case. Moreover, if the Newton step is feasible then (x+ ) = (x)2 , which is better than the theoretical result of Theorem 6.20. When we take a damped Newton step, with the default step size = is given by x+ = x x(1 + x) = 1 + |x| 0,
2x2 1x 1 , 1+(x)
the next iterate
if x > 0 if x < 0.
Thus we nd in this example that the damped Newton step is exact if x > 0. Also, if 1 < x < 0 then 2x2 < x2 , 1x and hence then the full Newton step performs better than the damped Newton step. Finally observe that if we apply Newtons method until (x) then the output is a point x such that |x| . Example 6.28 We now consider the function (x) introduced in Example 6.13:
n n
(x) :=
i=1
(xi ) =
i=1
(xi log(1 + xi )) ,
with e < x Rn . The gradient and Hessian of are x1 xn x ; ...; g(x) = = 1 + x1 1 + xn e+x 1 e 1 H(x) = diag = diag ; ...; (1 + x1 )2 (1 + xn )2 (e + x)2 We already established that is 1-self-concordant. One has n x2 = x . (x) = g(x)T H(x)1 g(x) = i
i=1
This implies that x = 0 is the unique minimizer. The Newton step at x is given by and a full Newton step yields x = H(x)1 g(x) = x(e + x),
x+ = x x(e + x) = x2 . The Newton step is feasible only if x2 > e, i.e., if x2 < e; this certainly holds if (x) < 1. Note that the theory guarantees feasibility only in that case. Moreover, if the Newton step is feasible then (x+ ) = x2 x x (x)2 , and this is again better than the theoretical result of Theorem 6.20. When we take a damped Newton step, with the default step size = is given by x(e + x) . x+ = x 1+ x
1 , 1+(x)
the next iterate
If we apply Newtons method until (x) then the output is an x such that x .
23
Example 6.29 Consider the (univariate) function f : (0, ) R dened by This is the logarithmic barrier function of the entropy function x log x. It can be easily shown that f is 1-self-concordant. One has x1 x+1 f (x) = + log x, f (x) = . x x2 Therefore, |x 1 + x log x| f (x)2 (x 1 + x log x)2 (x) = = = . (x) f 1+x 1+x Note that (1) = 0, which implies that x = 1 is the unique minimizer. The Newton step at any x > 0 is given by x (x 1 + x log x) f (x) = . x = f (x) 1+x So a full Newton step yields x (2 x log x) x (x 1 + x log x) = . x+ = x 1+x 1+x
1 When we take a damped Newton step, with the default step size = 1+(x) , the next iterate is given by x (x 1 + x log x) . x+ = x (1 + x) (1 + x) We conclude this example with a numerical experiment. If we start at x = 10 we get as output the gures in the following tableau. In this tableau k denotes the iteration number, xk the k-th iterate, (xk ) is the proximity value at xk and k the step size in the k + 1-th iteration.
f (x) = x log x log x, x > 0.
k xk f (xk ) (xk ) k 0 10.00000000000000 20.72326583694642 9.65615737513337 0.09384245791391 1 7.26783221086343 12.43198234403589 7.19322142387618 0.12205211457924 2 5.04872746432087 6.55544129967853 4.97000287092924 0.16750410705319 3 3.33976698811526 2.82152744553701 3.05643368252612 0.24652196443090 4 2.13180419256384 0.85674030296950 1.55140872104182 0.39194033937129 5 1.39932346194914 0.13416824208214 0.56132642454284 0.64048105782415 6 1.07453881397326 0.00535871156275 0.10538523300312 1.00000000000000 7 0.99591735745291 0.00001670208774 0.00577372342963 1.00000000000000 8 0.99998748482804 0.00000000015663 0.00001769912592 1.00000000000000 9 0.99999999988253 0.00000000000000 0.00000000016613 1.00000000000000 If we start at x = 0.1 the output becomes k 0 1 2 3 4 5 6 7 8 9 10 xk 0.10000000000000 0.14945506622819 0.22112932596124 0.32152237588997 0.45458940014373 0.61604926491198 0.78531752299982 0.96323307457328 0.99897567517041 0.99999921284500 0.99999999999954 f (xk ) 2.07232658369464 1.61668135596306 1.17532173793649 0.76986051286674 0.42998027395695 0.18599661844608 0.05188170346324 0.00137728412903 0.00000104977911 0.00000000000062 0.00000000000000 (xk ) 1.07765920479347 1.05829223631865 1.00679545093710 0.90755746327638 0.74937259761986 0.53678522950535 0.30270971353625 0.05199249905660 0.00144861398705 0.00000111320527 0.00000000000066 k 0.48131088953032 0.48583965986703 0.49830688998873 0.52423060340338 0.57163351098592 0.65070901307522 1.00000000000000 1.00000000000000 1.00000000000000 1.00000000000000 1.00000000000000
24
Observe that in both cases the number of iterations is signicantly less than the theoretical bound in (6.24).
6.10
Further estimates
In the above analysis we found an upper bound for the number of iterations that the algorithm needs to yield a feasible point x such that (x) . But what can be said about (x) (x ), and what about x x ? The aim of this section is to provide answers to these questions. We start with the following lemma. Lemma 6.30. Let x D and d Rn such that x + d D. Then d x 1+ d
2 x
dT (g(x + d) g(x))
d x 1 d
,
x x)
(6.25)
( d x ) ( d (x + d) (x) dT g(x) 2 2 In the right-hand side inequalities it is assumed that d x < 1. Proof. We have
1 1
(6.26)
dT (g(x + d) g(x)) = Using Lemma 6.17 we may write d x 1+ d

2 1
dT H(x + d)d d =
0 0
2 x+d
d.
=
x 0 1
d d
2 x x) 2 x x)
1 2 d
(1 + d (1 d
d
0
2 x+d 2
2 d
d x 1 d
.
x
From this the inequalities in (6.25) immediately follow. To obtain the inequalities in (6.26) we write
1
(x + d) (x) dT g(x) =
dT (g(x + d) g(x)) d.
Now using the inequalities in (6.25) we obtain

1 0
dT (g(x + d) g(x)) d
d x 0 1 d ( d x ) = 2 25
d =
x
log(1 d 2
x)
and
1 0
dT (g(x + d) g(x)) d
d x 1 + d 0 ( d x ) . = 2
d =
x
log(1 + d 2
x)
This completes the proof.
As usual, for each x D, (x) = x x , with x denoting the Newton step at 1 x. We now prove that if (x) < for some x D then must have a minimizer. Note that this surprising result expresses that some local condition on provides us with a global property, namely the existence of a minimizer. Theorem 6.31. Let (x) < x in D.
1
for some x D. Then has a unique minimizer
Proof. The proof is based on the observation that the level set L := {y D : (y) (x)} , with x as given in the theorem, is compact. This can be seen as follows. Let y D. Writing y = x + d, with d Rn , Lemma 6.30 implies the inequality (y) (x) dT g(x) + ( d 2
x)
= dT H(x)x +
( d 2
x)
where we used that, by denition, the Newton step x at x satises H(x)x = g(x). Since dT H(x)x d x x x = d x (x) ( d x ) . 2 Now let y = x + d L. Then (y) (x), whence we obtain (y) (x) d
x
we thus have
(x) +
d which implies
(x) +
( d 2
x)
0,
( d x ) (x) < 1. d x
(6.27)
Putting := d x one may easily verify that ()/ is monotonically increasing for > 0 and goes to 1 if . Therefore, since (x) < 1, we may conclude from (6.27) that d x can not be arbitrary large. In other words, d x is bounded above. This means that the set of vectors d such that x + d L is bounded. This 26
implies that the level set L itself is bounded. Since this set is also closed, the set L is compact. Hence has a minimal value in L, and this value is attained at some x L. Since is convex, x is a (global) minimizer of , and by Corollary 6.16, this minimizer is unique. 2
Example 6.32 Consider (x) :=
log xi ,
i=1
with 0 < x Rn . We established in Example 6.12 that is 1-self-concordant, and the rst and second order derivatives are given by g(x) = (x) = Therefore, e , H(x) = 2 (x) = diag x
n
e x2
T H(x)1 g(x) = (x) = g(x)
1=
i=1
n.
We conclude from this that has no minimizer (cf. Exercise 6.12).
The next example shows that the result of Theorem 6.31 is sharp.
Example 6.33 With 0 xed, consider the function f : (0, ) R dened by f (x) = x log x, x > 0. This function is 1-self-concordant. One has
f (x) =
1 , x 1 x
f (x) =
1 . x2
Therefore, (x) =
x2
= |1 x| .
Thus, for = 0 we have (x) = 1 for each x > 0. Since f0 (x) = log x, f0 (x) has no minimizer. On the other hand, if > 0 then ( 1 ) = 0 < 1 and x = 1 is a minimizer.
1 Exercise 6.12. Let (x) for all x D. Then is unbounded (from below) and hence has no minimizer in D. Prove this. (Hint: use Theorem 6.24.)
Solution:
Exercise 6.13. Let (x)

Solution:
for all x D. Then D is unbounded. Prove this.

2
The proof of the next theorem requires the result of the following exercise. 27
Exercise 6.14. For s < 1 one has
(s) = sup {st (t)} ,

t>1
whence (s) + (t) st, Prove this.

Solution:
s < 1, t > 1.
(6.28)
2
Theorem 6.34. Let x D be such that (x) < minimizer of . Then, with := (x),
and let x denote the unique
() () (x) (x ) 2 2 () () = x x x = . 1 + 1
(6.29) (6.30)
Proof. The left inequality in (6.29) follows from Theorem 6.24, because is minimal at x . Furthermore, from (6.26) in Lemma 6.30, with d = x x, we get the right inequality in (6.29): (x ) (x) dT g(x) + ( d x ) 2 ( d x ) d x+ 2 1 = 2 ( d x + ( d () , 2
x ))
where the second inequality holds since dT g(x) = dT H(x)x d

x
= d
(x) = d
(6.31)
and the fourth inequality follows from (6.28) in Exercise 6.14. For the proof of (6.30) we rst derive from (6.31) and inequalty (6.25) in Lemma 6.30 that 2 d x dT (g(x ) g(x)) = dT g(x) d x , 1+ d x
2 The
property of below means that (t) is the so-called conjugate function of (t).
28
where we used that g(x ) = 0. Dividing by d d x 1+ d
we get
which gives rise to the right inequality in (6.30), since it follows now that d
x
() = . 1
Note that the left inequality in (6.30) is trivial if d x 1, because then d x 1 1/, whereas 1+ < . Thus we may assume that 1 d x > 0. For 0 1, consider k() := g(x d)T H(x)1 g(x). One has k(0) = 0 and k(1) = (x)2 = 2 . From Exercise 6.11 and the Cauchy inequality we get k () = dT H(x d)H(x)1 g(x) = dT H(x d)x Hence we have
1
(x)
x)
(1 d
2.
2 = k(1)
x) 2
(1 d
d =
d x . 1 d x
After dividing both sides by this implies d Thus the proof is complete.
x
. 1 + 2
29
30
Chapter 7
Minimization of a linear function over a closed convex domain
7.1
Introduction
In this chapter we consider the problem of minimizing a linear function over a closed convex domain D: (P ) min cT x : x D . We assume that we have a self-concordant barrier function : D R, where D = int D, and also that H(x) = 2 (x) is positive denite for every x D. For each > 0 we dene cT x + (x),
(x) := and we consider the problem (P )
xD
inf { (x) : x D} .
We denote the gradient and Hessian matrix of (x) as g (x) and H (x), respectively. Then we may write g (x) := (x) = and H (x) := 2 (x) = 2 (x) = H(x). An immediate consequence of (7.2) is 3 (x) = 3 (x). So it becomes clear that the second and third derivatives of (x) coincide with the second and third derivatives of (x), and do not depend on . Assuming that (x) is -self-concordant, it follows that (x) is -self-concordant as well. 31 (7.2) c c + (x) = + g(x) (7.1)
The minimizer of (x), if it exists, is denoted as x(). When runs though all positive numbers then x() runs through the so-called central path of (P ). We expect that x() converges to an optimal solution of (P ) when approaches 0, since then the linear term in the objective function of (P ) dominates the remaining part. Therefore, our aim is to use the central path as a guideline to the (set of) optimal solution(s) of (P ). This approach is likely to be feasible, because since (x) is self-concordant its minimizer can be computed eciently. The Newton step at x D with respect to (x) is given by x = H(x)1 g (x). Just as in the previous chapter we measure the distance of x D to the -center x() by the local norm of x. So for this purpose we use the quantity (x) dened by (x) = x
x
xT H(x) x =
g (x)T H(x)1 g (x) = g (x)
H 1
Before presenting the algorithm we need to deal with two issues. First, when is small enough? We want to have the guarantee that the algorithm generates a feasible point whose objective value deviates no more than from the optimal value, where > 0 is some prescribed accuracy parameter. Second, we need to know what the eect is of an update of on our proximity measure (x). We start with the second issue.
7.2
Eect of a -update
g+ (x) = = c c c + (x) = + (x) = + g(x) + (1 ) (1 ) 1 1 c + g(x) g(x) = 1 (g (x) g(x)) . 1
Let := (x) and + = (1 ). Our aim is to estimate + (x). We have
Hence, denoting H(x) shortly as H, we may write + (x) = = 1 g (x) g(x) 1 1 g (x) 1
H 1 (x) H 1
+ g(x)
(x)
1 ( (x) + (x)) . 1
H 1
(7.3)
At present we have no means to obtain an upper bound for the quantity (x). Therefore, we use the following denition. 32
Denition 7.1. Let 0. The self-concordant function is called a -barrier if (x)2 = g(x)
2 H 1
x D.
(7.4)
An immediate consequence of this denition and (7.3) is the following lemma, which requires no further proof. Lemma 7.2. If is a self-concordant -barrier then (x) + + (x) . 1 In what follows we shall say that is a -barrier function if it satises (7.4). If is also -self-concordant then we say that is a (, )-self-concordant barrier function (SCB). Before proceeding we prove the next theorem, which provides some other characterizations of the -barrier property that will be used later on. Theorem 7.3. Let be two times continuously dierentiable and 0. Then the following three conditions are equivalent. (x)2 , x D ()2 (), x,h x,h (0)2 x,h (0), x,h
n
x D, h R , dom x,h x D, h Rn .
(7.5) (7.6) (7.7)
Proof. Obviously (7.6) implies (7.7), because 0 dom x,h . On the other hand, by replacing x by x + h in (7.7) we obtain (7.6). Hence it follows that (7.6) and (7.7) are equivalent. It remains to prove the equivalence of these statements with (7.5). This requires a little more eort. From the denition (6.21) of (x) it follows that (7.5) can be equivalently stated as x
2 x
= g(x)T H(x)1 g(x) ,

2 T
x D.
2
(7.8)
Assuming that this holds, and using H(x)x = g(x) we may write (0)2 = g(x)T h x,h
2 x 2 x
=
2
H(x)1 g(x)
H(x) h
2
= xT H(x)h h = (0) x,h x h

2 x
= ( x, H(x)h x ) (by (7.11)) (by (6.6)). 33
(using the Cauchy-Schwarz inequality)
This shows that (7.11) implies (7.7). It remains to prove the converse implication. Assuming that (7.7) holds, we write x
2 x
= g(x)T H(x)1 g(x) = g(x)T (x) = x,x (0) x,x (0) = = xT H(x)x
x
(by (6.6)) (by (7.7)) (by (6.6)) (by (6.14)).
x
x
This implies x plete.
, which is equivalent to (7.11). Hence the proof is com2
In the next three exercises the reader is invited to compute and for barrier functions of three important closed convex sets, namely the nonnegative orthant in Rn , the Lorentz cone Ln and the semidenite cone Sn . These are given by + + the nonnegative orthant: Rn = {x Rn : x 0} ; + the Lorentz cone:3
the semidenite cone:
Ln = x Rn : xn +
n1
x2 i
i=1
Sn = A Rnn : A = AT , xT Ax 0, x Rn . + Although Sn is dened as a set of matrices it ts in our framework if one realizes + that we have a one-to-one correspondence between n n matrices and vectors in 2 Rn , namely by associating to every n n matrix the concatenation of its columns, in their natural order. Exercise 7.1. Prove that
Solution: Consider (x) :=
n i=1
log xi is a (1, n)-SCB for Rn . +

n
log xi ,
i=1
0 < x Rn .
3 The cone is called after the Dutch physician Hendrik Antoon Lorentz who, together with his student Pieter Zeeman received the Noble prize in 1902 for their work on the so-called Zeemaneect. The cone is also called quadratic cone, or second order cone, or ice-cream cone.
34
Denoting the all-one vector as e, the rst and second order derivatives are given by g(x) = (x) = e , x H(x) = 2 (x) = diag
e . x2
One easily veries that is 1-self-concordant. Moreover, (x) =
g(x)T H(x)1 g(x) =
eT e diag (x2 ) = eT e = e = n. T x x
This shows that (x) is an n-barrier for Rn . Note that it follows that there is no x + int Rn = Rn such that (x) < 1/ = 1. This is in agreement with the fact that has ++ + no minimizer. 2
Exercise 7.2. Prove that log x2 x1:n1 n
is a (1, 2)-SCB for Ln . +
Solution: It is convenient to use the following representation of the cone Ln : + Ln = {(x, t) Rn : + Denote (x, t) = log t2 x
x t} . (x, t) int Ln . +
Let (x, t) int Ln , and (h, ) a nonzero vector in Rn . Denote () = (t+ )2 x + h 2 . + Note that () is quadratic in . So () = 0. We need to compare the second and third derivative to of the function () := (x + h, t + ) = log () at = 0. Some tedious, but straightforward calculations yield (using () = 0) (0) = t2 x 2 , (0) = (0) , (0)
2
(0) = 2 t hT x ,
(0)2 (0) (0) , (0)2
(0) = 2 2 h
(0) =
(0) = 2
(0)3 3(0) (0) (0) (0)3 2
The inequality ( (0)) 2 (0) is equivalent to 2(0) (0) ( (0)) , which holds if and only if
2 h
2 2
t x
t hT x
(h, ) Rn .
This certainly holds if 2 h 2 0. So we may assume 2 h 2 > 0. The above inequality is homogeneous in (h, ). Thus we may take = 1, and for a similar reason also t = 1. The worst case occurs if hT x = h x , whence the inequality reduces to
1 h
1 x
(1 h
x )2 ,
h Rn1 ,
2
whose validity can be easily checked. Finally, we need to check self-concordance. We have
2 2 (0)3 3(0) (0) (0) ( (0)) = (0))3 ( ( (0)2 (0) (0))3
35
Putting =
(0) (0) (0)2
we write ( (0)) (2 3)2 . 3 = ( (0)) (1 )3

2
1 Since 2(0) (0) ( (0)) we have 2 . Using this one may easily check that when 1 the right hand side expression is maximal for = 0. Hence we obtain 2
( (0)) 4. ( (0))3 So we may conclude that (x, t) is 1-self-concordant. 2
Exercise 7.3. Prove that log det X is a (1, n)-SCB for Sn . +

Solution: Recall that Sn = X Rnn : X = X T , xT Xx 0, x Rn . + Let For X int Sn and Y Sn we consider + (X) := log det X, X int Sn . +

() := (X + Y ) = log det (X + Y ) We may write () = log det (X + Y ) = log det X 2 I + X 2 Y X 2 X 2 = 2 log det X 2 log det I + X 2 Y X 2
n
1 1 1 1 1 1 1
= log det X log
i=1
1
(1 + i ) = log det X
1
log (1 + i ) ,
i=1
where i are the eigenvalues of X 2 Y X 2 . Taking derivatives to we get

n
() =
i=1
i , 1 + i
() =
i=1
2 i , (1 + i )2
() = 2
i=1
3 i . (1 + i )3
Hence we have ( (0)) = (0)

2
i=1 n
i 2 i=1 i
n,
2 2 n 3 ( (0)) i = ni=1 3 4 (0))3 ( 2 i=1 i
proving that (X) is a 1-self-concordant n-barrier for the semidenite cone Sn . +
36
Exercise 7.4. Show that each of the above self-concordant functions is logarithmically homogeneous. Denoting the function as , this means that there exists R such that (tx) = (x) log t, x D, t > 0. (7.9) Also show that if a self-concordant function satises (7.9) then it is a -barrier with = .
Solution: The barrier function for Rn is +
n
(x) := Hence we have

n
log xi ,
i=1
0 < x Rn .
(tx) =
i=1
log txi =
i=1
(log t + log xi ) = (x) n log t,
so (7.9) holds with = n. The barrier function for Ln is + (x, ) = log s2 x Hence (tx) = log t2 2 tx
(x, ) int Ln . +
2
= log t2 2 x
= (x) 2 log t, is
so (7.9) holds with = 2. Finally, the barrier function for (X) := log det X, Hence
Sn +
X int Sn . +
(tX) = log det (tX) = log det ((tIn ) X) log (det (tIn ) det X)
= log det (X) log det (tIn ) = (X) log tn = (X) n log t,
so (7.9) holds with = n. This proves the rst statement in the exercise. Note that in all cases we found that = . We next show that this holds in general. Dierentiating (7.9) with respect to t gives (tx)T x = . t For t = 1 this gives (x)T x = . Dierentiating once more, now with respect to x, yields (x) + 2 (x)x = 0. Hence, since (x) = g(x), 2 (x) = H(x) and H(x)x = g(x), we obtain g(x)T x = , Therefore, x
2 x
x = x.
= xT H(x)x = g(x)T x = .
37
It follows that (x) satises (7.11) with = . By Theorem 7.3 (or better: by its proof) it follows that (x) is a -barrier. It is worth pointing out that we have equality in (7.11), and hence also in (7.5), for all x D. 2
Before proceeding to the next section, we introduce the so-called Dikin-ellipsoid at x, and using this we give a new characterization of our proximity measure (x). Denition 7.4. For any x D the Dikin-ellipsoid at x is dened by Ex := {d Rn : Lemma 7.5. For any x D one has max dT g(x) : d Ex = (x). d
x
1} .
Proof. Due to Denition 7.4 the maximization problem in the lemma can be reformulated as max dT g(x) : dT H(x)d 1 . If g(x) = 0 then the lemma is obviously true, because then (x) = 0. So we may assume that g(x) = 0 and (x) = 0. In that case any optimal solution d will certainly satisfy dT H(x)d = 1. Hence, if d is optimal then g(x) = H(x)d, R,
where is a Lagrange multiplier. This implies d = H(x)1 g(x) = x, where x denotes the Newton step at x with respect to . Now dT H(x)d = 1 implies xH(x)x = 2 . Since we also have xH(x)x = (x)2 , it follows that = (x). So we get x d= , (x) whence, using H(x)x = g(x), dT g(x) = proving the lemma. g(x)T x xT H(x)x (x)2 = = = (x), (x) (x) (x) 2
For future use we also state the following result. Lemma 7.6. If is a self-concordant -barrier then we have dT g(x)
2
dT H(x)d, d Rn , x D. 38
Proof. The inequality in the lemma is homogeneous in d. Hence we may assume that dT H(x)d = 1. Now Lemma 7.5 implies that dT g(x) (x). Hence we obtain dT g(x)
2
(x)2 . By Denition 7.1 this implies the lemma.
Exercise 7.5. If is a self-concordant -barrier then g(x)g(x)T Derive this from Lemma 7.6.
Solution: The statement in Lemma 7.6 can be written as which implies that g(x)g(x)T dT g(x)g(x)T d dT H(x)d, d Rn , x D, H(x). 2
H(x),
x D.
Exercise 7.6. Prove that if is self-concordant then is a -barrier if and only if ((x)[h]) 2 (x)[h, h],
2
x D, h Rn .
(7.10)
Solution: Since (x)[h] = (0) and 2 (x)[h, h] = (0), this is an obvious consex,h x,h quence of (7.7). 2
We conclude this section with one more characterization of the -barrier property, leaving the proof to the reader. Exercise 7.7. Let be a self-concordant function and x,h dened as usual, then has the -barrier property if and only if d d Prove this.
Solution: One has d d () x,h 1 =
() x,h
1 ,
x D, h Rn , dom x,h .
(7.11)
() x,h ()2 x,h
. 2
Hence, (7.11) is equivalent to (7.6), which completes the proof.
Assuming that (P ) has x as optimal solution, we proceed with estimating the objective value cT x in terms of and (x). This is the subject in the next section. 39
7.3
Estimate of cT x cT x
For the analysis of our algorithm we will need some more lemmas. Lemma 7.7. Let be a self-concordant -barrier and x D and x + d D. Then dT g(x) . Proof. Consider the function q() = dT g(x + d), [0, 1).
Observe that q(0) = dT g(x). So we need to show that q(0) . If q(0) 0 there is nothing to prove. Therefore, assume that q(0) > 0. Since (x) is a -barrier, we have by Lemma 7.6, for any [0, 1), q () = dT H(x + d)d 1 T d g(x + d)
2
1 2 (q()) .
Therefore, q() is increasing and hence positive for [0, 1]. Therefore, we may write 1 1 1 1 1 q () 1 1 d = < . = 2 q() 0 q(0) q(1) q(0) (q()) 0 This implies q(0) < , completing the proof of the lemma. 2
Before proceeding we recall the denition of a dual norm. Denition 7.8. Given any norm . in Rn , the corresponding dual norm . dened by s = max sT x : x 1 .
is
Exercise 7.8. Let H be a positive denite nn matrix, and . H the corresponding norm on Rn . So x H = xT Hx for all x Rn . The dual norm is given by x Prove this.
Solution: By Denition 7.8, and since x s
H H
= x
H 1
= max sT x :
1
1 = max (H 2 s)T H 2 x : ||H 2 x||2 1 .

1
= ||H 2 x||2 , we have
1 1 1
Substituting y = H 2 x this amounts to maximizing (H 2 s)T y with y belonging to the unit sphere. Hence the optimal solution is given by y= H 2 s ||H 2 s||2
1 1
40
and the optimal value equals (H 2 s)T H 2 s ||H

1 2
1 1
s||2
sT H 1 s ||H 2 s||2
1
= ||H 2 s||2 = s
H 1
. 2
This completes the proof.
For any x D we denote the dual norm of the local norm . x as . Exercise 7.8 . x is the (local) norm determined by H(x)1 . So, d
x
x.
Due to
dT H(x)1 d,
d Rn .
x
Exercise 7.9. Let a, b Rn . Prove that aT b a
x.
Solution: Let . be any norm in Rn . Then Denition 7.8 implies that for any two vectors s and x in Rn we have sT x s . x Hence sT x s
x . This implies the desired result.
We are now ready for our second lemma in this section. Lemma 7.9. Let := (x) < and let x be an optimal solution of (P ). Then ( + ) . cT x cT x + + 1
1
Proof. First we consider the case where x = x(). Since then g (x) = 0, we derive from (7.1) that c = g(x). Since x D and x D, using Lemma 7.7 with d = x x = x x(), we get cT x() cT x = cT (x() x ) = cT d = g(x)T d . This proves the lemma in case = 0. Now let us turn to the general case. Then, using (7.1) once more we obtain c = (g (x) g(x)). Also using the inequality in Exercise 7.9 we may write cT x cT x() = cT (x x()) = (g (x) g(x)) (x x()) g (x) g(x) x x x() x , where . x denotes the (local) norm determined by H(x)1 . Since g (x) (x) = and g(x) x = (x) we have g (x) g(x) x g (x) x + g(x) x + . 41
x T
Moreover, by Theorem 6.34, x x() Substitution gives cT x cT x() Hence we may write
x
. 1 ( + ) . 1 ( + ) 1 . 2
cT x cT x = cT (x() x ) + cT (x x()) + proving the lemma.
7.4
Algorithm with full Newton steps
1 We assume that we know a point x0 D and 0 > 0 such that 0 (x0 ) = . Then we decrease = 0 with a factor 1 , where the barrier update parameter is a suitable number in the interval (0, 1), and perform a full Newton step. This process is repeated until is small enough, i.e., until for some small positive number . The algorithm is as described in Figure 7.1. The number of iterations is
Input: An accuracy parameter > 0;

1 proximity parameter (0, );
update parameter , 0 < < 1;
begin
x0 D and 0 > 0 such that 0 (x0 ) .
x := x0 ; := 0 ; while > do := (1 ); endwhile end x := x + x;
Figure 7.1. Algorithm with full Newton steps completely determined by , 0 and , according to the following lemma. 42
Lemma 7.10. The number of iterations of the algorithm does not exceed the number 1 0 . log
Proof. The algorithm stops when . After the k-th iteration we have = (1 )k 0 , where 0 denotes the initial value of . Hence the algorithm will have stopped if k satises (1 )k 0 . Taking logarithms at both sides this gives k log(1 ) log Since log(1 ) < 0 this is equivalent to k 0 1 log . log(1 ) . 0
Since log(1 ) , this certainly holds if k which implies the lemma. 0 1 log , 2
7.4.1
Analysis of the algorithm with full Newton steps
In this section we prove the following theorem.

1 1 Theorem 7.11. If = 9 and = 2+8 , then the algorithm with full Newton steps is well-dened and requires not more than
0 2 1 + 4 log iterations. The output is a point x D such that cT x cT x + 1 + where x denotes an optimal solution of (P ). 43
1 + 9 722
Proof. We need to nd values of and that make the algorithm well-dened. At the start of the rst iteration we have x = x0 D and = 0 such that (x) . When the barrier parameter is updated to + = (1 ), Lemma 7.2 gives (x) + + + (x) . (7.12) 1 1 Then after the Newton step, the new iterate is x+ = x + x and + (x+ ) + (x) 1 + (x)
2
(7.13)
The algorithm is well dened if we choose and such that + (x+ ) . To get the lowest iteration bound, we need at the same time to maximize . From (7.13) we deduce that + (x+ ) certainly holds if + (x) , 1 + (x) which is equivalent to . + (x) + (7.14)
According to (7.12) thisand hence + (x+ ) will hold if + . 1 + This leads us to the following condition on : Replacing 1 . + (1 + )
by this leads to 1 2 1 2 = . + (1 + ) + (1 + )
The question is which value of 0 yields the largest possible value for , because this value will minimize the iteration bound of Lemma 7.10. Of course, this value depends on the so-called complexity number . Some elementary analysis makes clear that the optimal value of lies between 0.27 and 0.30. Table 7.1 shows for some values of the smallest possible value for 1/. Probably the best value is 0.29715 with 1/ = 1.62723 + 7.10323 , since this has the smallest possible coecient of the complexity number (which may be large in practice). To simplify the presentation we will work with = 1/(2+8 ), which is a lower bound for the 1 largest possible value of . This is compatible with = 1/3, which gives = 9 . Thus we have justied the choice of the values of and in the theorem. 44
0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36
value of
1.52184 + 7.15828 1.55860 + 7.12504 1.59770 + 7.10701 1.63934 + 7.10383 1.68379 + 7.11535 1.73130 + 7.14162 1.78221 + 7.18286 1.83688 + 7.23949 1.89573 + 7.31212 1.95925 + 7.40160
1
Table 7.1. Some values of
as a function of .
Now that is given, the iteration bound is immediate from Lemma 7.10. The last statement in the theorem is implied by Lemma 7.9. Because at termination of the algorithm we have 9 (x) 1 and . Hence, denoting = (x), Lemma 7.9 implies that ( + ) T T c x c x + 1 + (1 ) 1 1 + cT x + 1 + 9 9 8 ( 9 ) 1 + 9 = cT x + 1 + . 722 This completes the proof. 2
7.5
Algorithm with damped Newton steps
The method that we considered in the previous sections is in practice rather slow. This is due to the fact that the barrier update parameter is rather small. For example, in the case of linear optimization the set D is the intersection of Rn and and ane space {x : Ax = b}, for some A and b. From Exercise 7.1 we know that n the logarithmic barrier function i=1 log xi is a 1-self-concordant n-barrier for 45
Rn . In that case we have = 1 and = n, and hence the value of is given by + 5 = 9+36n . Assuming 0 = 1 in Theorem 7.11, this leads to the iteration bound n n , n log 2 1 + 4 n log = O which is up till now the best known bound for linear optimization. In practice one is tempted to accelerate the algorithm by taking larger values of . But this is not justied by the theory, and in fact may cause the algorithm to fail because the full Newton step may yield an infeasible point. However, by damping the Newton step we can keep the iterates feasible. In this section we investigate the resulting method, which is in practice much faster than the full-Newton step method. So we consider in this section the case where is some small (but xed) constant in the interval (0, 1), for example = 0.5 or = 0.99, and where the new iterate is obtained from x+ = x + x, where x is the Newton step at x and where is the so-called damping factor, which is also taken from the interval (0, 1), but which has to be carefully chosen. The algorithm is described in Figure 7.2. We refer to the rst while-loop in the
Input: A proximity parameter =
1 3 ;
an accuracy parameter > 0; an update parameter , 0 < < 1; begin x0 D and 0 > 0 such that 0 (x0 ) .
x := x0 ; := 0 ; while > do := (1 ); = while (x) > do

1 1+ (x) ;
x := x + x; endwhile endwhile end
Figure 7.2. Algorithm with damped Newton steps algorithm as the outer loop and to the second while-loop as the inner loop. Each 46
execution of the outer loop is called an outer iteration and each execution of the inner loop an inner iteration. The main task in the analysis of the algorithm is to derive an upper bound for the number of iterations in the inner loop, because the number of outer iterations follows from Lemma 7.10.
7.5.1
Analysis of the algorithm with damped Newton steps
As we will see, in the analysis of the algorithm many results can be used that we already obtained in the analysis of the algorithm for minimizing a self-concordant function with damped Newton steps, in Section 6.9. Due to the choice of the damping factor in the algorithm, Theorem 6.24 implies that in each inner iteration the decrease in the value of satises (x) (x + x) ( (x)) . 2
1 3 ,
Since during each inner iteration (x) and > (x) (x + x) ( ) 1 > 2 2 1 3
we obtain 0.0457 1 > . 2 222
1 Thus we see that each inner iteration decreases the value of with at least 222 . This implies that we can easily nd an upper bound for the number of inner iterations during one outer iteration if we know the dierence between the values of at the start and at the end of one outer iteration. Since + (x) is minimal at x = x(+ ), this dierence is not larger than
+ (x) + (x(+ )), where x denotes the iterate at the start of an outer iteration and + = (1 ) the value of the barrier parameter after the -update. The proofs of the next two lemmas follow similar arguments as used in the proof of Theorem 2.2 in [4]. Lemma 7.12. Let 0 < . Then we have d (x()) cT x() g(x())T x() = = . 2 d Proof. Denoting the derivative of x() with respect to as x (), we may write d d (x()) = d d cT x() + (x()) = cT x() cT x () + + g(x())T x (). 2
The denition of x(), as minimizer of (x), implies c g(x()) = . 47
Hence we obtain
cT x () + g(x())T x () = 0, cT x() d (x()) = , d 2 2
whence
which implies the lemma.
Lemma 7.13. Let x D, (x) = 1/(3) and + = (1 ). Then we have + (x) + (x(+ )) Proof. Fixing x D, we dene () = (x) (x()). Then we need to nd an upper bound for (+ ). According to the Mean Value Theorem there exists a (+ , ) such that (+ ) = () + () (+ ) . Let us consider rst (). We have () = d (x) d (x()) cT x d (x()) = . d d 2 d (7.16) (7.15) 1 + . 2 13 1
Using Lemma 7.12 we get () = cT x cT x() cT (x() x) g(x())T (x x()) + = = . 2 2 2
Now applying Lemma 7.7 twice, with d = x x() and d = x() x respectively, we obtain | ()| . Hence, since (+ , ), we get | ()| Substitution into (7.15) yields (+ ) () + ( + ) = () + . + 1 48 . +
In other words, + (x) + (x(+ )) (x) (x()) + . 1
Since (x) = 1/(3), we derive from Theorem 6.34 that (x) (x()) Hence the lemma follows. 1 1 2 3 = 0.0721318 1 < . 2 13 2 2
Theorem 7.14. The algorithm with damped Newton steps requires not more than 22 2 1 + 13 2 1 log 0 .
iterations. The output is a point x D such that

T T
1 + 3 c xc x + 1+ 62
where x denotes an optimal solution of (P ).

1 Proof. Since each inner iteration decreases the value of with at least 222 , an immediate consequence of Lemma 7.13 that the number of inner iterations between two successive -updates does not exceed the number
22 2
1 + 2 13 1
Using Lemma 7.10, the iteration bound in the theorem follows. The last statement in the theorem follows from Lemma 7.9. At termination of the algorithm we have 3 (x) < 1 and . Hence, denoting = (x), Lemma 7.9 implies ( + ) cT x cT x + 1 + (1 ) 1 1 T 3 3 + c x + 1+ 2 ( 3 ) 1 + 3 T . =c x + 1+ 62 This completes the proof. 2
49
It is interesting to compare the iteration bounds that we obtained in this chapter for full-Newton and damped-Newton steps. When initialized with the same x0 D and 0 > 0 these bounds are given by 0 2 1 + 4 log and 22 2 1 + 13 2 1 log 0 = 22 22 2 + 13 1 log 0 ,
0 respectively. Neglecting the factor log , we see that the rst bound is O( ). 2 On the other hand, when assuming = (1), the second bound is O(( ) ).
This shows that from a theoretical point of view the full-Newton step method is more ecient than the damped-Newton step method. In practice, however, the converse holds. This phenomenon has become known as the irony of interior-point methods [10, page 51]. Also note that in both cases the quantity is solely responsible for the iteration bound, or complexity of the algorithm. That is why we followed [3] and called this the complexity number of . Exercise 7.10. Verify that the complexity value does not change if we scalar (x) by a positive scalar.
Solution: 2
7.6
Adding equality constraints
In many cases the vector x of variables in (P ) not only has to belong to D but has also to satisfy a system of equality constraints. The problem then becomes (P ) min cT x : Ax = b, x D .
We assume that A is a m n matrix and rank (A) = m. This problem can be solved without much extra eort. The search direction has to be designed such that feasibility is maintained. Given a feasible x we take as a search direction x the direction that minimizes the second order Tailor polynomial at x subject to the condition Ax = 0. Thus we consider the problem 1 min (x) + xT g (x) + xT H(x) x : A x = 0 . 2 This gives rise to the system H(x) x + g (x) = AT y, 50 Ax = 0,
whence, denoting H(x) as H, x = H 1 AT AH 1 AT or, equivalently, H 2 x = I H 2 AT AH 1 AT = P

AH 2
1 1 1
AH 1 g (x) H 1 g (x)
AH 2 H 2 g (x)
H 2 g (x),
1
where P 1 denotes the orthogonal projection onto the null space of AH 2 . Note AH 2 that if the system Ax = b is void, i.e., A = 0 and b = 0, then x is just the old direction. Denoting the feasible region of (P ) as P and its interior as P, one easily understands that the restriction P of to P is a -self-concordant -barrier for P. Moreover, x as above, is precisely the Newton direction for P at x P. Hence, essentially the same full-Newton step method and damped-Newton step method as before can be used to solve the above problem in polynomial time. More ecient methods can be obtained by using dierent schemes, like adaptive -updates, a predictor-corrector method, etc. We do not work out this further. Exercise 7.11. Verify that the restriction P of to P is a -self-concordant -barrier for P.
Solution: 2
51
52
Chapter 8
Solving convex optimization problems
8.1
Introduction
In this chapter we show how the results of the previous chapter can be used to solve a convex optimization problem of the form inf {f (x) : x F} , where F = {x Rn : gj (x) 0, 1 j m} . (8.1) It will be assumed that the functions f (x) and gj (x) are continuously dierentiable. First we observe that without loss of generality we assume that f (x) is a linear function. Because we can introduce a new variable , add the constraint f (x) 0 to the problem and then minimize . So, assuming that f (x) = cT x for some suitable vector c, we may instead consider the following problem. (CP ) with F as dened in (8.1). inf cT x : x F (8.2)
We will furthermore assume that (CP ) satises the Slater condition, namely F 0 := int F = . It is clear from the previous chapter that this problem is tractable from a computational point of view whenever we have a self-concordant barrier function for the 53
The Lagrange-Wolfe dual of (CP ) is given by sup cT x + m y g (x) j=1 j j m (CD) j=1 yj gj (x) = c y 0, j = 1, , m.
j
interior F 0 of the domain of (CP ). The aim of this chapter is to show that in many cases such a self-concordant barrier function can be obtained, but that this often requires a reformulation of the constraint functions in (CP ).
8.2
Getting a self-concordant barrier for F
As we show in this section, a self-concordant barrier for F 0 can be easily obtained if we know self-concordant barriers for the subsets of Rn determined by the constraint functions. Let us dene Fj := {x Rn : gj (x) 0} , and suppose that j (x) is a self-concordant barrier function for Fj . Then
m
(8.3)
(x) =
j=1
j (x)
(8.4)
is a self-concordant barrier function for F. This follows from the following lemma. Lemma 8.1. If j is a j -self-concordant j -barrier for Fj , where j {1, . . . , m}, then , as just dened, is a -self-concordant -barrier for F = m Fj , with j=1
m
= max j ,
1jm
=
j=1
j .
Proof. The lemma is obvious if m = 1. Below we prove the lemma for m = 2. The extension to larger values of m is straightforward, by using induction to m. So suppose m = 2 and (x) = 1 (x) + 2 (x), where x F1 F2 . Now using Lemma ?? we may write, for any h Rn , (2 (x)[h, h]) 2 3 (x)[h, h, h]
3
3 1 (x)[h, h, h] + 3 2 (x)[h, h, h] (2 1 (x)[h, h] + 2 2 (x)[h, h]) 2

3
2 2 21 1 + 22 2
(1 + 2 ) 2
where i = 2 i (x)[h, h], with i {1, 2}. The last expression is homogeneous in [1 ; 2 ]. So, since i 0, we may assume 1 + 2 = 1. With = 1 , the last expression then gets the form 21 2 + 22 (1 ) 2 ,
3 3
[0, 1].
This function is convex in , and hence its maximal value occurs either for = 0 or for = 1. Hence the maximal value is given by 2 max {1 , 2 }. This proves the statement on . On the other hand, by Lemma 7.6,
1 1 1 1
((x)[h]) (1 (x)[h] + 2 (x2 )[h]) = 2 2 (x)[h, h] 1 (x)[h, h] + 2 2 (x)[h, h] 54
2 2 2 2 1 1 + 2 2
1 + 2
1 + 2 .
The last inequality is due to the Cauchy-Schwarz inequality, since 1 + 2 = 1. Thus the lemma has been proved. 2
Recall from Theorem 7.11 and Theorem 7.14 that the complexity of the algo rithms that we considered in the previous chapter is an increasing function of . Therefore, it is quite important to get an SCB with as small as possible. In this respect it is worth to recall from Exercise 6.6 that a positive multiple of a self-concordant function is again a self-concordant function. The next lemma shows that a similar statement holds for self-concordant barrier functions. Lemma 8.2. Let be -self-concordant barrier (or shortly (, )-SCB) and R, > 0. Then is a ( , )-SCB. Proof. According to the denitions of and we have 3 (x)[h, h, h] (2 (x)[h, h])
2 3
42 ,
((x)[h]) , 2 (x)[h, h]
2
x D, h Rn .
Denoting = , this implies 2 3 (x)[h, h, h] 3 (x)[h, h, h] = 3 3 4 2 (x)[h, h] 3 2 (x)[h, h]

2 (x)[h] 2 ((x)[h]) = , 2 (x)[h, h] 2 (x)[h, h] This proves the statement in the lemma. 2
and
x D, h Rn . 2
It thus follows that instead of the SCB in (8.4) we can also use the SCB that is obtained by multiplying each term j (x) in the sum dening (x) by a positive multiplier j . This yields
m
(x) =
j=1
j j (x),
j
j > 0,
m j=1
which is a ( , )-SCB with = maxj
and =
j j . So, it might
be useful to use (positive) multipliers j that minimize . It has been argued m that the optimal choice is j = 2 , which gives = 1 and = j=1 2 j [3, page j j 4849]. Later on (cf. Lemma 9.10) we show that there exist positive multipliers j such that m 2 j , = j
j=1
55
and that is the best possible (i.e., minimal) value for . In many problems one or more constraint have the form
p
i=1
hi (x) t,
(8.5)
where also t is a decision variable in the problem. In such cases it is often convenient to replace this constraint by the equivalent system of inequalities
p
hi (x) ti (i = 1, . . . , p),
i=1
ti t.
(8.6)
Exercise 8.1. Prove that x and t satisfy (8.5) if and only there exist ti (i = 1, . . . , p) such that x and t satisfy (8.6).
Solution: It is obvious that (8.6) implies (8.5). On the other hand, if x and t satisfy (8.5), then dening ti = hi (x), also (8.6) is satised. 2
Exercise 8.2. Prove that log t
ti
i=1 p i=1 ti
is a (1, 1)-self-concordant barrier function for the linear constraint (8.6).

Solution: Denoting x = (t; x) and g() = t x h Rn , we dene
p
t in
i=1 ti ,
and letting h = (h0 ; h), with
() := log Then () = and hence (0) = Therefore, g(h) , g( + h) x
t + h0
(ti + hi )
i=1
= log g( + h). x
() =
g(h)2 , g( + h)2 x g(h)2 , g()2 x
() =
g(h)3 , g( + h)3 x
g(h) , g() x
(0) =
(0) =
2g(h)3 . g()3 x
| (0)| (0)
3 2
= 2,
(0)2 = 1, (0) 2
proving the claim.
56
f (x) 1 2 3 4 5 6 7 log x, x > 0 ex x log x, x > 0

1 x,
(x, t) log (t + log x) log x log (log t x) log t log (t x log x) log x log (tx 1) , x > 0 log x t
2 p 1 p
1 1 1 1 1 1 1
2 2 2 2 2 2 4
x>0
xp , x > 0, p 1 xp , x > 0, 0 p 1 |x| , p 1

p
log t
2
log (xp + t) log x log t x 2 log t
Figure 8.1. Self-concordant barriers for some 2-dimensional sets
It follows from Lemma 8.1 that if we know (i , i )-SCBs i (x, t) for the respective epigraphs of hi (x) (1 i p) then
p p
i=1
i (x, ti ) log t
ti
i=1 p j=1
is a ( , )-SCB with = maxi {i } =
and = 1 +
i .
8.3
Tools for proving self-concordancy
Recall that the epigraph of a function f : D R is dened by epi (f ) = {(x, t) : x D, f (x) t} , and that f is a convex function if and only if epi (f ) is a convex set (cf. Exercise ??). In many cases the domain of a problem is described by inequalities of the form f (x) t. The domain is then (part of) the intersection of the epigraphs of convex functions. The table in Figure 8.1 shows some examples of self-concordant barriers for 2-dimensional sets that are epigraphs of simple convex functions [5, page 22]. One easily veries that in all cases one has f (x) < t if and only if (x, t) belongs to the domain of (x, t). However, to prove that the functions (x, t) in this table are SCBs, with the given values of and , is a nontrivial and tedious task. This task is enormously simplied when using the following lemma, which is a powerful tool in proving self-concordancy of functions. Lemma 8.3. Let f (x) C 3 (D) be a convex function, where D is an open convex 57
set in int Rn . If there exists a scalar 0 such that

n
3 f (x)[h, h, h] 3hT 2 f (x)h then we have for each > 0 that (x) :=
i=1
h2 i , x2 i
x D, h Rn ,
(8.7)
f (x) log xi , i=1

2 3 2 2 n
is -self-concordant on D, with = max 1,
. Moreover, for q 1, log xi ,
(x, t) := q log(t f (x))
i=1 ++1 3 2 +4+2

2
is -self-concordant on epi f = {(x, t) : f (x) t}, where = max 1,
Proof. We start by proving the rst part of the lemma. Note that if f (x) satises (8.7) then f (x)/ satises (8.7) as well. Hence without loss of generality we may assume that = 1. Straightforward calculations yield
n
(x)T h = f (x)T h
i=1
hi xi
n
(8.8) h2 i x2 i
n
hT 2 (x)h = hT 2 f (x)h +
(8.9) h3 i . x3 i (8.10)
i=1
3 (x)[h, h, h] = 3 f (x)[h, h, h] 2
i=1
Since f is convex, the two terms on the right-hand side of (8.9) are nonnegative, i.e. the right-hand side can be abbreviated by
n
h (x)h = a + b ,
a = h f (x)h,
b =
i=1
h2 i , x2 i
(8.11)
with a, b 0. Because of (8.7) we have that Applying the inequality .

n 3
1 3
|3 f (x)[h, h, h]| 3a2 b. .

2
(see Exercise 8.3) to the vector

n
h x
we get
i=1
h3 i x3 i
i=1
h3 i x3 i 58
1 3
i=1
h2 i x2 i
1 2
= b.
Thus we can bound the right-hand side of (8.10) as follows. |3 (x)[h, h, h]| 3a2 b + 2b3 . Hence we have 3 (x)[h, h, h] 3a2 b + 2b3 2 (a2 + b2 ) 2
3
2 (hT 2 (x)h)
3 2
The last expression is homogenous in a and b, so we assume without loss of generality that a2 + b2 = 1 and then is the maximal value of f (b) := 1 1 1 3a2 b + 2b3 = b 3(1 b2 ) + 2b2 = b 3 + (2 3)b2 , 2 2 2
where 0 b 1. One has 2f (b) = 3 + (2 3)b2 = 3 (1 b2 ) + 2(1 )b2 , 0 b 1.
If 1 then f (b) is monotonically increasing for b [0, 1]. Since f (1) = 1 we conclude that = 1 if 1. Otherwise we have f (b) = 0 if and only if b2 = /(32). For this value of b we get the maximal value of f (b). By substitution of this value we obtain = 1 b 3 + (2 3)b2 = b = 2 = 3 2 2 3 2 2 .
Hence the rst part of the lemma has been proved. Now we turn to the proof of the second part of the lemma. Let x= t x , h= h0 h , h Rn and g() = t f (x). x
Then, when denoting (x, t) as (), we may write x

n
() = q log g() x x ()T h = q x
log xi
i=1 n
(8.12) (8.13) (8.14)
g()T h x hi g() x x i=1 i
n (g()T h)2 x h2 hT 2 g()h x i +q + hT 2 ()h = q x g() x g()2 x x2 i i=1
3 g()[h, h, h] x (hT 2 g()h)g()T h x x 3 ()[h, h, h] = q x + 3q 2 g() x g() x n T 3 3 (g() h) x hi 2q 2 . g()3 x x3 i=1 i 59
(8.15)
Since f (x) is convex, g() is concave and hence the three terms in the right-hand x side of (8.14) are nonnegative. Therefore, (8.14) can be rewritten as hT 2 ()h = a2 + b2 + c2 , x with a2 = q hT 2 g()h x , g() x b2 = q a, b, c 0,
n
(g()T h)2 x , g()2 x

2
c2 =
i=1
h2 i . x2 i
Note that since t appears linearly in g() we have g()[h, h] = 2 f (x)[h, h] x x and 3 g()[h, h, h] = 3 f (x)[h, h, h]. Therefore, due to (8.7) it follows that x
n
x 3 g()[h, h, h] 3 hT 2 g()h x which gives q
i=1
a2 g() x h2 i = 3 c, 2 xi q
3 g()[h, h, h] x 3a2 c. g() x
By substitution into (8.15) we obtain 2 3 |3 ()[h, h, h]| 3a2 c + a2 b + b3 + 2c3 . x q q Hence, since q 1, we have 3 ()[h, h, h] x x 2 hT 2 ()h
3 2
3a2 c + 3a2 b + 2b3 + 2c3 2 (a2 + b2 + c2 ) 2

3
If a = 0 then the right-hand side is at most 1. Otherwise, we may assume without loss of generality that a2 + b2 + c2 = 1 and we solve from = max 1 3a2 c + 3a2 b + 2b3 + 2c3 2 : a2 + b2 + c2 = 1 .
The optimality conditions are 2ac + 2ab = a, a2 + 2b2 = b, a2 + 2c2 = c.
Since a = 0, this gives = 2b + 2c. Substitution yields a2 + 2b2 = 2b2 + 2bc, This gives a2 = 2bc, and hence 2 bc + c2 = bc + c2 . 60 a2 + 2c2 = 2bc + 2c2 .
If c = 0 we are back in the previous case, with = 1, and then = 1. If c = 0 then we obtain 2 b + c = b + c, which gives = 1 or c = ( + 1)b. If = 1 then the objective becomes 1 1 3a2 c + 3a2 b + 2b3 + 2c3 = 6bc2 + 6b2 c + 2b3 + 2c3 = (b + c)3 2 2 and the constraint (b + c)2 = 1. So = 1 if = 1. Otherwise we have a2 = 2bc and c = ( + 1)b. Substitution of these relations into a2 + b2 + c2 = 1 gives 2( + 1)b2 + b2 + ( + 1)2 b2 = 1, whence b2 = 1/(3 2 + 4 + 2). Thus we obtain = 3 2 ( + 1)2 b3 + 3( + 1)b3 + b3 + ( + 1)3 b3 = 2 + + 1 3 2 + 4 + 2 b3 2 + + 1 3 2 + 4 + 2 . 2
= 2 + + 1 b = This completes the proof of the lemma.
Exercise 8.3. Recall that the p-norm of a vector x Rn is dened by

n
1 p
=
i=1
|xi |
q
where p 1. Prove that if q > p 1 then x
x p , for any vector x.
Solution: For any > 0 and for any p 1 we have x p = x p , as one easily veries. Therefore it is sucient for the proof if we show that x q x p holds whenever x p = 1. Hence we only need to show that
max
:
i=1
|xi | = 1
1.
We will frequently use that if a R+ then we have for any r > 0 that a 1 if and only if ar 1. Hence the condition on x in the above problem implies |xi | 1 for each i. Since q > p this gives |xi |qp 1, whence |xi |q |xi |p , for each i. Thus it follows that
n n
i=1
|xi |q
n
i=1
|xi |p = 1,
1 q
and hence also x

q
=
i=1
|xi |
1.
61
This completes the proof. Remark: A dierent approach might be to show that the derivative of decreasing. Writing x
p
2 x
p
to p is
= ep
1 log
n p i=1 |xi |
),
we can equally well show that the exponent is decreasing in p. This amounts to showing that n n p 1 i=1 |xi | log |xi | 1 p n + log 0, |xi | p p2 p i=1 |xi | i=1 which is equivalent to
n n n
i=1
|xi |p log |xi |p
i=1
|xi |p
log
i=1
|xi |p
This is an interesting inequality, which should be true. But how to prove it?
The above result only give values for , and not for . In this respect the following lemma is of high importance. Lemma 8.4. Let f (x) C 3 (D) be a convex function, where D is an open convex set in int Rn . Then the so-called logarithmic barrier function associated with f (x) is given by (x) = log(f (x)) If this function is self-concordant then it is a SCB with = 1. Proof. Let x D and h Rn . Dening () = (x + h) = log (f (x + h)) we need to show that (0)2 (0). One has () = and hence (0) = f (x)[h] , f (x) (0) = (f (x)[h])2 2 f (x)[h, h]f (x) . f (x)2 f (x + h)[h] , f (x + h) () = (f (x + h)[h])2 2 f (x + h)[h, h]f (x + h) , f (x + h)2
Since 2 f (x)[h, h] 0 and f (x) < 0 we have 2 f (x)[h, h]f (x) 0, and thus it follows that (0)2 (0). Hence the lemma follows. 2 Note that if f (x) is a convex quadratic function, say f (x) = xT Ax + bT x + c 62
for some positive semidenite n n matrix A, a vector b Rn and a scalar c, then f (x) satises condition (8.7) with = 0. Hence it immediately follows from Lemma 8.3 that n f (x) (x) := log xi , i=1 is 1-self-concordant on Rn , for every > 0, and also using Lemma 8.4,
n
(x, t) := q log(t f (x)) is a (1, n + q)-SCB on epi f , for every q 1.
log xi ,
i=1
For the current purpose it is of special interest to consider the case where the function f is univariate. An immediate consequence of Lemma 8.3 is the following corollary. Corollary 8.5. Let f (x) C 3 (D) be a convex function, where D is an open convex set in R. If there exists a such that |xf (x)| 3f (x), Then, for each > 0, (x) := f (x) log x,
2 3 2 2
x D
(8.16)
(8.17) . Moreover, for q 1, (8.18)

++1 3 2 +4+2
2
is -self-concordant on D, with = max 1,
(t, x) := q log(t f (x)) log x, is -self-concordant on epi f = {(x, t) : f (x) t}, where = max 1,
8.4
Application to the functions in the table of Figure 8.1
In this section we show how the results in the table of Figure 8.1 can be obtained from Corollary 8.5 and Lemma 8.4. It may be worth noting that the expression 2 + + 1 3 2 + 4 + 2 is monotonically increasing and takes the value 1 if = 1. Hence, if f (x) satises condition (8.16) with 1, then (x, t) is 1-self-concordant. 63
(x, t) = log (t + log x) log x

In this case one has f (x) = log x. Hence f (x) = So condition (8.16) yields 2 3 2, 2 x x x > 0. 1 , x f (x) = 1 , x2 f (x) = 2 . x3
2 This holds for = 3 1, and hence Corollary 8.5 implies that (x, t) is 1-selfconcordant. By Lemma 8.4 the term log (t + log x) contributes 1 to , and so does the term log x, hence = 2.
(x, t) = log (log t x) log t

It is convenient to interchange the role of the variables x and t and replace t by t. This yields the function (x, t) = log (t + log x) log x. As we just established, this is a (1, 2)-self-concordant barrier function on its domain.
(x, t) = log (t x log x) log x

In this case one has f (x) = x log x. Hence f (x) = 1 + log x, So condition (8.16) yields 3 1 , x x x > 0. f (x) = 1 , x f (x) = 1 . x2
1 This holds for = 3 1, and hence Corollary 8.5 implies that (x, t) is 1-selfconcordant. Moreover, just as in the previous case, = 2.
(x, t) = log (tx 1)

We rst write (x, t) = log t 1 x log x,
1 and apply Corollary 8.5 to the function f (x) = x . Then
f (x) =
1 , x2
f (x) = 64
2 , x3
f (x) =
6 . x4
So condition (8.16) yields 6 2 3 , 3 x x3 x > 0.
This holds for = 1, and hence Corollary 8.5 implies that (x, t) is 1-self-concordant and, again, = 2.
1
(x, t) = log x t p
log t, p 1
It is convenient to interchange the role of the variables x and t. So we consider (x, t) = log t x p log x, p 1 and show that this function is self-concordant on its domain. With f (x) = x p we have f (x) =
1 1 p 1 1 x , f (x) = p p 1 1 1 1 x p 2 , f (x) = p p 1 1
1 1 p
1 1 2 x p 3 . p
So condition (8.16) yields 1 p 1 1 p

1 1 1 2 x p 2 3 p p 1 1 1 x p 2 , p
x > 0,
which is equivalent to 1 2 3. p Since p 1 this holds for =

1 3
that (x, t) is 1-self-concordant and, again, = 2.
1 p
1, and hence Corollary 8.5 implies
(x, t) = log (xp + t) log x, 0 p 1

With f (x) = xp we have f (x) = pxp1 , f (x) = p(p 1)xp2 , f (x) = p(p 1)(p 2)xp3 .
So condition (8.16) yields p(1 p)(2 p)xp2 3p(1 p)xp2 , x > 0,
1 2 which is equivalent to 2 p 3. This holds for = 3 (2 p) 3 , and hence Corollary 8.5 implies that (x, t) is 1-self-concordant and, once more, = 2.
65
(x, t) = log t p x2 2 log t, p 1

One has (x, t) = log t p + x log t + log t p x log t
(x,t) (x,t)
1 1 1
Note that the domain of (x, t) consists of all pairs (x, t) such that t p x and 1 the domain of (x, t) consists of all pairs (x, t) such that t p x. Hence, the 1 domain of (x, t) consists of all pairs (x, t) such that |x| t p with t > 0, which is p p equivalent to |x| t. So the domain of (x, t) is the epigraph of |x| , which is the intersection of the domains of (x, t) and (x, t). Due to Lemma 8.4 it suces to show that (x, t) and (x, t) are SCBs. We start by considering (x, t). As we did in previous cases, we interchange the roles of x and t, which leads to the function log t + x p log x. We apply Corollary 8.5 to f (x) = x p . One has 1 1 1 f (x) = x p 1 , f (x) = p p So condition (8.16) yields 1 p 1 1 p 2 1 p x p 2 3
1 1 1
1 p
x p 2 , f (x) =
1 p
1 p
1 p
x p 3 .
1 p
1 p
x p 2 ,
x > 0,
which is equivalent to 2 Since p 1 this holds for =

1 3
1 3. p 2 , and hence Corollary 8.5 implies that 3
(x, t) is 1-self-concordant and, again, = 2. Since x R, also (x, t) 1-selfconcordant with = 2. Hence it follows that (x, t), being the sum of (x, t) and (x, t), is 1-self-concordant with = 4.
1 p
8.5
Application to other convex problems
In the examples below A denotes an m n matrix and c and b are n- and mdimensional vectors, respectively. 66
8.5.1
Entropy minimization
The entropy minimization problem is given by

n
min cT x +
i=1
xi log xi : Ax = b, x 0 .
Rewriting this problem as

n
min cT x +
i=1
ti : Ax = b, xi log xi ti (i = 1, . . . , n), x 0 ,
the objective function becomes linear, and we have a SCB for each of the inequality constraints. Exercise 8.4. Show that the Lagrange dual of the above problem is the unconstrained problem
n
max bT y where ai denotes the i-th column of A.

Solution:
eai yci 1
i=1
,
2
The dual problem can be modelled as

n
min
i=1
ti bT y : eai yci 1 ti , 1 i n ,
for which we have the following SCB for the domain:

n n
i=1
log log ti + 1 + ci aT y i
log ti .
i=1
8.5.2
Extended entropy minimization
A more general entropy programming problem is dened as

n
min cT x +
i=1
gi (xi ) : Ax = b, x 0 ,
where we assume that the scalar functions gi C 3 satisfy

|gi (xi )| 3i gi (xi ) , xi
i = 1, . . . , n.
67
This holds in the case of entropy minimization, where we have gi (xi ) = xi log xi . The condition guarantees (by Lemma 8, sheets of week 9) that log(ti gi (x)) log xi is a SCB for gi (xi ) ti , x 0. Reformulating the problem as
n
min cT x +
i=1
ti : Ax = b, gi (xi ) ti (i = 1, . . . , n), x 0 ,
we thus have the following SCB for the domain:

n n
i=1
log (ti gi (xi ))
log xi .
i=1
8.5.3
p -norm optimization
r
Let {Ik }k=1 be a partition of the index set {1, . . . , n} into r classes. If pi 1, for i = 1, . . . , n, then the primal p norm optimization problem has the following form: max bT y :
iIk
1 T |a y ci |pi + bT y dk 0, k = 1, . . . , r , k pi i
where (for all i and k) ai , bk , and b are m-dimensional vectors, and ci and dk are real numbers. The problem can be reformulated as: max bT y : spi ti , |aT y ci | si (i = 1 : n), i i A SCB for the problem is given by
r
iIk
ti dk bT y (k = 1 : r) . k pi
k=1
log dk
bT y k
iIk
ti pi
i=1
log (ti spi ) + log s2 aT y ci i i i

1 p
+ log s2 ti . i
An alternative way is to replace the constraint spi ti by ti i si and to use i the SCB
r
k=1
log dk
bT y k
iIk
ti pi
i=1
log ti i si
1 p
+ log s2 aT y ci i i
+ log s2 ti . i
Exercise 8.5. Show that the Lagrange dual of the above problem is the problem
r
min cT x + dT z +
k=1
zk
iIk
1 xi qi zk
qi
: Ax + Bz = b, z 0 ,
where A is the matrix whose columns are ai , and the positive number qi is such that 1 1 pi + qi = 1 (i = 1 : n), whereas B is the matrix whose columns are bk (k = 1 : r). 68
Exercise 8.6. Assuming qi 1, show that
An alternative way of formulating the dual problem is as follows. n sqi z qi +1 t (i I , k = 1 : r) ti i k i k min cT x + dT z + : |x| s, Ax + Bz = b, z 0, s 0 qi
i=1 q log(ti sqi zk i +1 ) log zk log si i
q is an SCB for the constraint sqi zk i +1 ti . Using this, nd a SCB for the dual i p -norm optimization problem.
8.5.4
Geometric optimization
r
As in the previous case, let {Ik }k=1 be a partition of the index set {1, . . . , n} into r classes. Let ai Rm , ci R (i = 1, . . . , n) and b Rm . The geometric optimization problem is then given by max bT y :
iIk
eai yci 1, k = 1, . . . , r .
Here e denotes the natural base of the logarithm (and not the all-one vector!). Exercise 8.7. Find a SCB for the domain of the geometric optimization problem.
Solution: 2
Remark: Geometric optimization problem occur frequently in another equivalent form, in the so-called posynomial form. Note that e
iIk
m j=1 aij yj ci
=
iIk
eci
j=1
eaij yj =
iIk
(eyj )aij =
j=1 iIk
i
j=1
j ij
where i = eci > 0 and j = eyj > 0. The above polynomial is called a posynomial because all coecients i and all variables j are positive. Observe that the substitution j = eyj convexies the posynomial.
The dual geometric programming problem is given by

r
min cT x +
k=1 iIk
xi log 69
xi
iIk
xi
: Ax = b, x 0
where A is an m n matrix and c and b are n- and m-dimensional vectors, respectively. Note that the primal entropy maximization problem is a special case (take c = 0, r = 1, iIk xi = 1 and A and b appropriately). Using the same trick as before, we claim that
r n
k=1
log tk
xi log
iIk
xi
iIk
xi
log xi
i=1
is a (1, n + r)-SCB for this problem. We conclude this section by presenting a proof of this claim, which is this time complicated by the fact that the objective function is not separable. We may write
r
k=1
log tk
xi log
iIk
xi
iIk
xi
log xi =
i=1
k=1
log tk
xi log
iIk
xi
iIk
xi
log xi
iIk
Thus it suces for the proof of the claim to show that the function below is 1-selfconcordant. (x) = log tk xi log
iIk
xi
iIk
xi
log xi ,
iIk
(8.19)
for some xed k. Now we can use Lemma 8.3, so that we only have to verify that the condition (8.7) in this lemma is satised by f (x) :=
iIk
xi log
xi
iIk
xi
=
iIk
xi log xi
xi
iIk
log
iIk
xi
and = 1. For simplicity, we will drop the subscript i Ik . The condition (8.7) then reduces to the following inequality: ( h3 i 2 ( xi where xi > 0 and hi R. hi )3 3 xi )2 h2 ( i xi hi )2 xi h2 i , x2 i (8.20)
To prove this inequality we introduce variables i according to i = xi

1 2
hi
1
hi xi . xi (8.21)
Note that
xi2 i = 0. 70
Using this substitution we may write h3 1 i = 2 x2 xi i

1
1
xi2 i +
hi xi xi
3 2 = xi 2 i + 3i
1 hi ( + 3xi2 i xi (
hi ) xi )
+ xi
( (
hi ) xi )
3.
Now using (8.21) we obtain h3 ( i 2 ( xi We proceed as follows. ( h3 i x2 ( i hi )3 = xi )2 = = Hence, since x > 0 we get h3 ( i x2 ( i hi )3 xi )2
2 i 2 i xi 2 i + 3 2 i 2 i 1
hi )3 = xi )2
3 2 xi 2 i + 3i
hi xi
hi xi hi xi
hi hi +3 xi xi hi hi +2 . xi xi
|hi | +2 xi
2 i
hi | . xi
Now using the obvious inequality |hi | xi h2 i x2 i
and the following inequality, which is due to Cauchy-Schwarz, h2 i x2 i we obtain h3 ( i 2 ( xi hi )3 xi )2

2 i
xi
h2 i x2 i
x2 i
|hi | xi xi
|hi |
h2 i +2 x2 i
2 i
h2 i =3 x2 i
2 i
h2 i . x2 i
The last expression is precisely the right-hand side expression in (8.20). Hence the proof is complete. 2
71
Remark: Starting from inequality (8.20) the proof can be nished in an alternative way as follows. Dividing the whole inequality by xi and then substituting rst hi = yi xi and thereafter ti = xi / xj we get the equivalent inequality y 3 t (y T t)3 3(y 2 t (y T t)2 )
T T
y T y,
ti = 1. (Here y 3 , e.g., is the vector with entries where yi are arbitrary, ti positive and 3 yi .) Since y T t = E(y) can be interpreted as the expected value of some random variable y, the last inequality is equivalently rewritten as E(y 3 ) E(y)3 3(E(y 2 ) E(y)2 )

2 yi ,
relating the variance of y to some third moment. By adding (E(y 2 ) E(y)2 ) and
2 yi (E(y 2 ) E(y)2 ) max yi = E((y E(y))2 max yi )
E((y E(y))2 y) = E(y 3 ) 2E(y)E(y 2 ) + E(y)3 2(E(y 2 ) E(y)2 )
2 yi 2(E(y 2 ) E(y)2 )E(y) = 2E(y)E(y 2 ) 2E(y)3
we get i.e. inequality (8.20) follows.
3(E(y 2 ) E(y)2 )
2 yi E(y 3 ) E(y)3 ,
Exercise 8.8. Let R, with 0, and let A denote an m n matrix A and b Rm . Find a SCB for the following optimization problem in the vector x Rn of variables:
n
max
i=1
xi ) : Ax = b, x 0 . e(e
Hint: replace the objective function by its logarithm.

Solution:
72
Chapter 9
Conic optimization
9.1
Introduction
(CP ) inf cT x : Ax b K1 , x K2 ,
In Section ?? we dened a conic optimization problem as a problem of the form
where A is an m n matrix, b Rm and c Rn , and where K1 and K2 denote convex cones. It was also mentioned that the Lagrange dual of (CP ) is given by (CD)
sup bT y : c AT y K2 , y K1 .
Here K1 denotes the dual cone of K1 (and K2 the dual cone of K2 ). Recall that the n dual cone K of a convex cone K R is dened by
K = x Rn : xT y 0, y K .
(9.1)
Exercise 9.1. Prove that the duality is symmetric. In other words, show that the dual problem of (CD) is (equivalent to) (CP ).
Solution: We write the dual problem as
inf bT y : (AT )y (c) K2 , y K1 .
The dual of this problem is then
sup (c)T x : b (AT )T x K1 , y K2 , which is equivalent to (CP ).

T T
Now let x be feasible for (P ) and y for (D). The duality gap c x b y satises cT x bT y = c AT y
K2
K2
x +(Ax b)
K1
y 0.
K1
73
Hence the weak duality property holds (cf. Theorem ??). Strong duality occurs if and only if the duality gap vanishes, i.e., if and only if x and y satisfy c AT y
T
x = 0,
(Ax b) y = 0.
(9.2)
We are especially interested in cases where strong duality occurs. We introduce the following two properties: (P.1) The primal problem is strictly feasible and below bounded; (D.1) The dual problem is solvable (i.e., has an optimal solution). An immediate consequence of the Lagrange duality theorem (cf. Theorem ??) is that if property (P.1) holds then (D.1) holds as well, and then the optimal values of (CP ) and (CD) are equal. Using the symmetry mentioned in Exercise 9.1, it follows that when dening the properties (P.2) The primal problem is solvable; (D.2) The dual problem is strictly feasible and above bounded; then (D.2) implies (P.2) and also that the optimal values of (CP ) and (CD) are equal. As a consequence we may state the following result without further proof. Theorem 9.1. If (CP ) and (CD) are both strictly feasible then (CP ) and (CD) are both solvable and the optimal values are equal. The aim of this chapter is to present a method that decides in polynomial time whether or not (CP ) and (CD) have optimal solutions with vanishing duality gap. Before proceeding we rst want to show that the conic optimization model is general enough to cover a wide class of nonlinear optimization problems.
9.2
Every optimization problem can be modelled as a conic problem
Suppose we are given an arbitrary optimization problem, where the aim is to minimize a function f (x) over a given closed domain X: min {f (x) : x X} . (9.3)
As far as x conccerns this problem has the same set of optimal solutions asand hence is equivalent tothe problem min { : f (x) , x X} . 74
So, dening it suces to consider problems of the form Z := {z = (x; ) : f (x) , x X} min cT z : z Z , (9.4)
where c is a suitably chosen vector. Now let Z denote the convex hull of Z. Then, T since the objective function c z is linear, minimizing cT z over Z yields a problem whose set of optimal solutions contains the optimal solutions of (9.4), and with the same optimal value. Hence we consider the problem min cT z : z Z . Now dene K := (z; t) : z tZ, t 0 .
Then K is a convex cone. Because if (z; t) K and 0 then (z; t) = (z; t), and since z tZ implies z (t)Z we have (z; t) K. Moreover, if (z1 ; t1 ) K and (z2 ; t2 ) K then z1 + z2 t1 Z + t2 Z = (t1 + t2 ) t1 Z + t2 Z (t1 + t2 )Z. t1 + t2
The last inclusion is due to the fact that Z is convex. Hence it follows that (z1 ; t1 )+ (z2 ; t2 ) K. Thus we have shown that K is a convex cone. We conclude that our problem can be reformulated as min cT z + 0 t : t = 1, (z; t) K , which is conic formulation of the given problem, because the objective function is a linear function of the variables z and t, and the constraints are the linear constraint t = 1 and the conic constraint (z, t) K. Note that by writing the linear constraint t = 1 as t 1 {0}, the conic formulation gets the form of (CP ), with K1 = {0} and K2 = K.
The above reasoning proves that every optimization problem can be put in the form of (CP ). Note, however, that the transformation to the conic form requires a description of the convex hull of the intersection of the epigraph of f with the set {(x; ) : x X, R}. In general, it is hard to nd such a description. Moreover, for being able to solve the resulting conic problem eciently, we need a selfconcordant barrier function for the cone K. The existence of such a barrier function is guaranteed by the following extremely important result, which we cite without proof [8, Section 2.5]. Theorem 9.2. Every closed convex cone has a self-concordant, logarithmically homogeneous barrier function. Recall that a self-concordant function is called logarithmically homogeneous if it satises (7.9). 75
Theorem 9.2 is an existence theorem. It does not tell us how to get a suitable barrier function, let alone a barrier function that is computationally tractable. The good news, however, is that many practical problems can be put in the conic form with a cone that is a direct product of the standard cones, and for these cones we already know self-concordant, logarithmically homogeneous barrier functions. Although in general it may far from obvious to get a conic formulation of a given nonlinear optimization problem, in many cases this taskthough not trivialis not very dicult. We present a simple example. Example 9.3 Let z = min y1 + y2 : (y1 1)2 + (y2 2)2 4, y1 y2 4, y1 0 . How to model this as a conic optimization problem? One has 1 y1 (y1 1)2 + (y2 2)2 4 2 y2 L3 2 and y1 y2 4, y1 0 Replacing the 2 2 these constraint can 1 1 y1 2 y2 2 2 2 y1 = 0 2 2 2 2 y2 0 y1 2 2 y2 S2 . + concatenating its columns, 0 1 0 0 0 0 1 y1 y2
matrix by the vector obtained by be put in the following form. 1 y1 1 y2 2 0 0 2 0 y1 = 0 1 0 2 0 0 2 0 0 y2 0

c
AT
L3 S2 +
Hence, with b = [1; 1] the problem can be formulated as a conic optimization problem (in the standard dual form): z = sup bT y : c AT y K , K = L3 S2 . + (9.5)
Since the dual problem of the dual problem is the primal problem, the dual problem has the standard primal form: z = inf cT x : Ax = b, x K = K , where we used that L3 and S2 are self-dual, and therefore K = K. So the dual + 76
which, by eliminating the variables x6 and x7 can be reformulated as x1 1 + x1 x5 S2 . z = inf x1 + 2x2 + 2x3 + 4x5 : x2 L3 , + x5 1 + x2 x3 Note that x1 = x2 = 1, x3 = 2, x5 = 0 is feasible, with objective value 2 2 3. A better value is obtained by taking x1 = x2 = x3 = 0, and x5 = 1, whose value 4. This solution is optimal. This follows since y1 = y2 = 2 is feasible for (9.5) with z = 4, which is the same value. So the optimal value of the original problem is z = 4.
problem is given by x1 x x = 1 x2 1 4 x3 , inf x1 + 2x2 + 2x3 + 2x5 + 2x6 : x x = 1 2 7 x4 x6
x5 x7
L3
2 S+
9.3
Solution method
We already know that an -solution (if it exists) of (CP ) can be found in polynomial time by the full-Newton step algorithm provided that the following two conditions are satised. (i) we have a -self-concordant -barrier for the cone K = K1 K2 , and (ii) we know a point x int K and > 0 such that (x) <
1 3 .
If the cone K is a direct product of the standard cones, condition (i) can be easily met. But, in general a starting point x for the algorithm, as required by condition (ii) is not available. Our rst aim is to show that condition (ii) can be met by embedding (CP ) and its dual problem (CD) in a larger, self-dual problem (SP ). In fact we will show that if int K int K is nonempty then we have as a starting point the point on the central path of (SP ) corresponding to = 1. So we can use our algorithm for solving (SP ). Having a solution of (SP ), we will show that this gives us optimal solutions for (CP ) and (CD) if these exist, and otherwise we can decide that (CP ) and (CD) are not both solvable with vanishing duality gap. The latter occurs if (CP ) and (CD) are either both unbounded or infeasible, or they have dierent optimal values, or they have equal optimal values but the optimal value is not attained in at least one of the two problems. 77
9.4
Reduction to inequality system
We ask the question whether (CP ) and (CD) have optimal solutions with vanishing duality gap. We claim that this holds if and only if the inequality system Ax b K1 ,
T T
b y c x R+
cA y
K2 ,
x K2
y K1
(9.6)
has a solution. Observe that the relations in the rst two lines imply that x and y are feasible for (CP ) and (CD) respectively. By weak duality this implies cT xbT y 0. This, together with the third line yields bT y = cT x, proving the claim. If = 1, the following veried. 0mm A b AT 0nn c bT cT 0 inequality system is equivalent to (9.6), as easily can be K1 y x K2 , R+
x K2 , y K1 , 0.
(9.7)
we have proved the following result.
This system is homogeneous: if (y, x, ) is a solution, then so is (y, x, ), for any 0. So, the system has a solution with = 1 if and only if it has a solution with > 0. Hence, dening the matrix M and the vector z by y 0 A b := AT (9.8) M 0 c , z := x , T T b c 0
Theorem 9.4. The problems (CP ) and (CD) have optimal solutions with vanishing duality gap if and only if the following system has a solution.
M z K := K1 K2 R+ , z K = K1 K2 R+ ,
> 0.
(9.9)
The new variable is called the homogenizing variable. Thus our task has been reduced to nding a solution of the system (9.9) or to prove that no solution exists. Note that the matrix M is skew-symmetric. In the above theorem we used that the dual cone of a direct product of cones is equal to the direct product of the duals of the respective cones. The proof of this elementary result is left to the reader. Exercise 9.2. Let Ki , i = 1 . . . r be a nite number of cones. Then we have
(K1 . . . Kr ) = K1 . . . Kr .
78
Prove this.
Solution: Let K := K1 . . . Kr . Then the elements x of K have the form x = (x1 , . . . , xr ) , For each i, suppose that Ki Rmi and let y = (y1 , . . . , yr ) , Then xT y = xT y1 + . . . + xT yr . 1 r Recall that
T Ki = yi Rmi : yi xi 0 xi Ki ,
x1 K1 , . . . , xr Kr . y1 Rm1 , . . . , yr Rmr . (9.10) i = 1, . . . , r.
Hence, if y = (y1 , . . . , yr ) K1 . . . Kr then all terms at the right-hand side in (9.10) are nonnegative, and so we have y K . On the other hand, if y = (y1 , . . . , yr ) K then we have
xT y1 + . . . + xT yr 0, 1 r
x = (x1 , . . . , xr ) , xi Ki , yi Rmi , i = 1, . . . , r. xT yi 0, i xi Ki , yi Rmi .
Now let i {1, . . . , r}. By taking x1 = . . . = xi1 = xi+1 = xr = 0 we get

This implies yi K i , for each i. Hence y K1 . . . Kr , which completes the proof. 2
9.5
Interior-point condition
The method we are going to use for solving the system (9.9) is an interior-point method, and for this we need the system to satisfy the interior-point condition. Denition 9.5. We say that a system of equalities and conic inequalities satises the interior-point condition (IPC) if it has a feasible solution that satises all conic inequality constraints strictly. The system (9.9) does not satisfy the IPC. Because let z = (y, x, ) be an interior solution of (9.9). Since > 0, we deduce from z K1 K2 R+ and M z K1 K2 R+ that x/ is feasible for (CP ) and y/ is feasible for (CD). Hence, by weak duality, (cT xbT y)/ 0, whence cT xbT y 0. But since z = (y, x, ) is an interior solution of (9.9) we have M z int (K1 K2 R+ ), whence bT y cT x > 0, a contradiction. The rest of our approach highly depends on the following assumption. Assumption 9.6.
int K int K = . 79
(9.11)
Exercise 9.3. Let e int K and R+ . Then the set S = x K : eT x is bounded. Prove this.
Solution: Let 0 = x K . Since e belongs to the interior of the cone K we have, for small enough > 0, that also x e K. x Since x is in the cone dual to K, it follows that xT This implies eT x xT x = x . x
.
x x
0.
Hence, if x S then eT x implies x , whence we obtain x that the set S is bounded.
This shows 2
We introduce one more nonnegative variable and add it to the vector z , and also extend the matrix M with one extra column and row, according to M := where Note that since the matrix M is skew-symmetric, M is skew-symmetric as well. Exercise 9.4. If M is an n n skew-symmetric matrix and z Rn , then z T M z = 0. Prove this.
Solution: Using that z T M z is a scalar and M T = M we may write zT M z = zT M z which implies z T M z = 0.
T
M r T 0 r with
z :=
(9.12)
r := e M e,
e int K int K .
(9.13)
= z T M T z = z T M z 2
The matrix M has order m + n + 2. We denote e 1 0m+n+1 eT e 80 0 eT e + 1
e :=
q :=
Crucial for the embedding technique is that the following system satises the IPC. M z + q K := K R+ , z K = K R+ . (9.14)
Theorem 9.7. The system (9.14) satises the IPC. Moreover, (CP ) and (CD) have optimal solutions with vanishing duality gap if and only if this system has a solution with = 0 and > 0. Proof. We rst prove that (9.14) satises the IPC. The vector e does the work, because taking z = e, we have Me + q = M r T 0 r e 1 + 0 e e
T
Me + r T e + eT e r
e 1
= e.
The last equality is due to the denition of r, which implies M e + r = e and T e + eT e = e M e r

T
e + eT e = T e + eT e = 1, e
where we used eT M e = 0, which holds since M is skew-symmetric (cf. Exercise 9.4). We conclude that M e + q = e. Since e int K and e int K this proofs the rst statement in the theorem. (9.15)
To prove the second statement we rst recall from Theorem 9.4 that (CP ) and (CD) have optimal solutions with vanishing duality gap if and only if the system (9.9) has a solution. Thus it suces to prove that (9.9) has a solution if and only if (9.14) has a solution with = 0 and > 0. In order to prove this, it is convenient to write (9.14) in terms of z and : M r T 0 r z + 0 e e
T
K R+
K R+
(9.16)
Obviously, if z = (; 0) satises (9.16) this implies M z K and z K , and hence, z if > 0, z satises (9.9). On the other hand, if z satises (9.9) then M z K, z K and > 0; as a consequence z = (, 0) satises (9.16) if and only if T z + eT e 0, z r i.e., if and only if rT z eT e. If rT z 0 this certainly holds. Otherwise, if rT z > eT e, then we multiply z with the positive scalar eT e/rT z ; after that the above inequality is satised. Since a cone is invariant under positive multiplication, this is sucient for our goal. 2
81
9.6
Embedding into a self-dual problem
We are now ready to present the conic optimization problem that we are going to solve, and whose solution can be used to decide whether both (CP ) and (CD) are solvable or not. We consider the following problem. (SP ) min q T z : M z + q K, z K .
where
where M is as dened in (9.12) and e1 0 e 0 2 e= , q = , 1 0 1 eT e

e1 int K1 int K1 ,
Due to (9.8), (9.9), (9.12) and (9.14) the constraints in this problem can be written out as K1 K1 y K K x 2 z = K = , Mz + q K = 2 , R+ R+ R+ R+
M e + q = e,
(9.17)
e2 int K2 int K2 .
Let us rst establish that in the usual sense this problem is very easy to solve and that its optimal value is zero. Lemma 9.8. For every optimal solution of (SP ) one has = 0. Proof. Due to the denitions of q and z one has q T z = eT e , hence all feasible objective values are nonnegative. On the other hand, since q K the zero vector (z = 0) is feasible, and yields zero as objective value, which is therefore the optimal value. 2
More important for our goal is the next result. Theorem 9.9. The problems (CP ) and (CD) are both solvable with vanishing duality gap if and only if the problem (SP ) has an optimal solution with > 0. Proof. According to Theorem 9.7 (CP ) and (CD) have optimal solutions with vanishing duality gap if and only if system (9.14) has a solution with = 0 and > 0. Since the constraints in (SP ) are the same as in (9.14), and since by Lemma 9.8 every optimal solution of (SP ) satises = 0, the theorem follows. 2 82
We conclude that (CP ) and (CD) have optimal solutions with duality gap zero if and only if (SP ) has an optimal solution with > 0. Exercise 9.5. Assume that K = K. Prove that then the dual problem of (SP ) is in essence the same problem as (SP ). This is expressed by saying that (SP ) is a self-dual problem.
Solution: The dual problem of (SP ) is given by
max
q T y : q M T y K , y K .
Since K = K, M T = M , and minimizing q T y is equivalent to maximizing q T y the dual problem can be written as
min
q T y : M y + q K, y K . 2
This shows that (SP ) is self-dual.
We associate to any vector z Rn its slack vector s(z) := M z + q. Then we have z is a feasible for (SP ) If z is feasible then Hence we have a feasible z is optimal for (SP ) z T s(z) = 0.
T
(9.18)
z K
& s(z) K. (9.19)
q T z = (s(z) M z) z = z T s(z).
The latter condition is the so-called complementarity condition for (SP ). Since s(e) = e int K int K , we have s(e) int K and e int K . This means that the vector e is an interior point for (SP ), and hence it can be used as starting point for an interior-point method for solving (SP ). However, the full-Newton step method requires that the starting point is close to the central path of the problem. But the central path depends on the barrier function. So we rst need to deal with barrier functions for conic optimization problems. Exercise 9.6. Let z be feasible for (SP ). Prove that eT z + eT s(z) = eT e + q T z = (1 + )eT e. 83 (9.20)
Derive from this that the vectors z and s(z) are bounded if 0 1.
Solution: By the orthogonality property uT M u = 0, u Rn+m+2 . Taking u = z e we obtain (z e)T (s(z) s(e)) = 0. Since s(e) = e and z T s(z) = q T z = eT e, this implies eT z + eT s(z) = eT e + q T z = (1 + )eT e. Since the vector e belongs to int K int K this implies that the vectors z K and s(z) K are bounded if 0 1. 2
9.7
Self-concordant barrier function for (SP )

(SP ) min q T z : s = M z + q, s K, z K .
The problem (SP ) can be reformulated as follows.
In (SP ) the constraints are the equality constraints M z s = q and the conic constraints s K, z K . Let and be SCBs for the cones K and K , respectively. Then (s) + (z) is an SCB that we can use to solve (SP ). This is a consequence of the following lemma. Lemma 9.10. If i is a i -self-concordant i -barrier for Ki , where i {1, 2}, then 1 + 2 is a -self-concordant -barrier for K1 K2 , with = max {1 , 2 } , = 1 + 2 .
There exist positive numbers 1 and 2 such that 1 1 + 2 2 is a (, )-SCB for K1 K2 for which the complexity number is given by = 2 1 + 2 2 . 1 2
Proof. Let x = [x1 ; x2 ] K1 K2 and h = [h1 ; h2 ] with hi in the same space as xi , i {1, 2}, and (x) = 1 (x1 ) + 2 (x2 ). Then, with i = 2 i (xi )[hi , hi ], we have
3 (x)[h, h, h]
3
(2 (x)[h, h]) 2
3 1 (x1 )[h1 , h1 , h1 ] + 3 2 (x2 )[h2 , h2 , h2 ]
3 3
(2 1 (x1 )[h1 , h1 ] + 2 2 (x2 )[h2 , h2 ]) 2
2 2 21 1 + 22 2
(1 + 2 ) 2
84
This function is convex in , and hence its maximal value is either 21 or 22 . On the other hand,
1 1 1 1
The last expression is homogeneous in [1 ; 2 ]. So, since i 0, without loss of generality we may assume 1 + 2 = 1. With = 1 , the expression then gets the form 3 3 21 2 + 22 (1 ) 2 , [0, 1].
(1 (x1 )[h1 ] + 2 (x2 )[h2 ]) ((x)[h]) = 2 2 (x)[h, h] 1 (x1 )[h1 , h1 ] + 2 2 (x2 )[h2 , h2 ]
2 2 2 2 1 1 + 2 2
1 + 2
1 + 2 .
The last inequality follows by using the Cauchy-Schwarz inequality. Now let 1 > 0 and 2 > 0. Then also 1 1 + 2 2 is a SCB for K1 K2 , with = max Hence, dening = 2 = max
1 2 ,
1 , 2 1 2
= 1 1 + 2 2 .
2 2 1 , 2 1 2
(1 1 + 2 2 ) = max 2 1 + 1
2 , 2 (1 + 2 ) . 2
Note that the rst argument is decreasing in and the second argument is increasing in . Moreover, both arguments go to innity, the rst if 0 and the second if . Hence there is a unique value of where both arguments are equal, and one easily veries that this value is given by = 2 /2 . For this value we get 1 2 2 = 2 1 + 2 2 , 1 2 and hence the lemma follows.
n
The remaining task is to nd SCBs for K and K . This is in general a nontrivial task. But if the cone K is a direct product of a nite number of copies of standard cones then this task becomes easy, because then K = K, and we know already SCBs for the three standard cones (cf. Exercises 7.1, 7.2 and 7.3). This is based on Exercise 9.2 and the following three exercises. Exercise 9.7. The standard cone Rn is self-dual. +
Solution:
In Exercise 7.1 we saw that i=1 log xi is a (1, n)-SCB for Rn . By Lemma + 9.10 this is a direct consequence of the fact that log x is a (1, 1)-SCB for R+ .
Exercise 9.8. The standard cone Ln is self-dual. + 85
Cone Rn + Ln + Sn +
Barrier = n
n i=1
log xi
2
n 2 n
1 1 1
= log x2 x1:n1 n n = log det X n
n 2 n
Table 9.1. SCBs for the three standard cones
Solution:
Exercise 9.9. The standard cone Sn is self-dual. +

Solution:
For ease of reference the SCBs for the standard cones are summarized in Table 9.1. For building a SCB for the direct product of such cones we can use Lemma 9.10 repeatedly, which also enables us to compute the complexity number of the resulting SCB in a simple way. Exercise 9.10. 4 Let be one of the barrier functions in Table 9.1. If the corresponding standard cone is denoted as K, then we have, for any x K and for any t > 0, 1 (x) t 1 2 (tx) = 2 2 (x) t 2 (x)x = (x) (tx) = xT (x) = . Derive this from the fact that is logarithmically homogeneous, as dened in Exercise 7.4.
Solution: Due to Exercise 7.4 we have (tx) = (x) log t, x K, t > 0. (9.21)
Dierentiating both sides in (9.21) with respect to x we obtain t(tx) = (x),

4 See
(9.22)
[8] and/or [9].
86
which for t = 1 leads to the third identity in the exercise. Finally, dierentiating both sides in (9.21) with respect to t we obtain xT (tx) = . t Taking t = 1 we get the fourth identity in the exercise. 2
which gives the rst identity in the exercise. Dierentiating both sides in (9.22) once more with respect to x yields the second identity, whereas dierentiation of both sides in (9.22) with respect to t yields (tx) + t2 (tx)x = 0,
If K is a (nite) direct product of standard cones then K is a so-called symmetric cone. For the moment we use this name without explaining it. An explanation will be given later on, in Section 10.5. If the underlying cones in a conic optimization problem are symmetric we call it a symmetric optimization problem. This will be the subject of the next chapter.
87
88
Chapter 10
Symmetric Optimization
Let the cone K be the direct product of the three standard cones Rn , Ln and Sn , + + + for suitable (and probably dierent) values of n. Since Rp Rq = Rp+q , we may + + + assume that there is only one linear cone Rn . The complexity number of the barrier + function for this cone is n . Each second order cone has complexity number 2. Finally, we also may assume that there is one semidenite cone (since the direct product of semidenite matrices is semidenite), say Sn . Hence, if n denotes the + number of second order cones in K, the complexity number for K becomes = n + 2n + n . Since each standard cone has = 1, we may take = 1 and then = n + 2n + n . (10.1)
What can be said about the starting point e? Each component in the direct product that composes K is a standard cone. As will become clear soon, each standard cone has a natural candidate for its component in the vector e: the all-one vector in Rn , + the vector (0n1 ; 2) in Ln , and + the identity matrix In in Sn , where In denotes the identity matrix of size + n n. By taking for e the direct product of such vectors (for the respective cones) we have eT e = n + 2n + n = . Hence, combing this with (10.1) we obtain that = eT e. 89 (10.3) (10.2)
Note that the cone underlying (SP ) equals K K , and K = K. Hence the -value of the barrier function of (SP ) is 2. We claim that e is the point on the central path for = 1. The proof of this claim is postponed to the subsequent sections. Starting the algorithm at this point, with 0 = 1, according to Theorem 7.11 the algorithm with full Newton steps requires at most 2 2 1 + 4 2 log iterations and the output is a point z such that q T z = eT e = (as usual, denotes the last coordinate of z) and (z) 1 , with /. 9
10.1
Self-dual problems over the standard cones

(SP ) min q T z : s = M z + q, s K, z K
In this section we consider the case where the cone K in is one of the standard cones. For the moment we relax the assumption that q has the special form as given in (9.17). We only assume that (i) there exists a vector e int K int K ; (ii) q = e M e K. A consequence of these assumptions is that (SP ) satises the IPC. We rst establish that Lemma 9.8 is still valid. Lemma 10.1. The optimal value of (SP ) equals 0. Proof. Let (s; z) be feasible for (SP ). Then, as before, q T z = sT z, since M is skew-symmetric. Since s K and z K we have sT z 0. Hence q T z 0. But since q K, the zero vector is feasible. Hence the lemma follows. 2 Note that the problem (SP ) as considered in Section 9.6 satises the conditions (i) and (ii), as is clear from (9.17).
10.1.1
On the structure of the matrix M
Recall from the above discussion that the matrix M in (SP ) is skew-symmetric and also that the vector q belongs to the cone K. Before proceeding to the special case that K is a product of standard cones, we need to point out that more can be said about the structure of the matrix M . Since in that case K = K, the constraints in (SP ) are z K, M z + q K. 90
Assuming that K RN , the matrix M has size N N . We denote the columns of M as Mi , with i {1, . . . , N }. Then the constraint M z + q K can be written as
N i=1
zi Mi + q K.
N
We already know that q K. Hence we have i=1 zi Mi q + K K + K. As is usual, we denote the smallest linear subspace of RN that contains K by span K. So we see that the constraints in (SP ) imply that
N
i=1
zi Mi span K.
In what follows we make the following assumption. Assumption 10.2. The columns of the matrix M belong to span K. In the next three sections we deal subsequently with linear constraints, conic quadratic constraints and semidenite constraints, which are constraints of the above form with K equal to Rm , Lm and Sm , respectively. For these cones we + + + have span Rm = Rm , span Lm = Rm , span Sm = Sm . (10.4) + + + where Sm denotes the linear space of all symmetric matrices. As a consequence Assumption 10.2 will be restrictive only in the third case, when we deal with semidefinite constraints. Exercise 10.1. Prove (10.4).
Solution: The statement is obvious for Rm , because the m unit vectors (with a 1 in one + position and zeros elsewhere) belong to Rm , and they form a basis of Rm . For Lm we use + + the vectors (0; 1), and the m 1 vectors (ei ; 1), where ei runs through the unit vectors in Rm1 . These m vectors all belong to Lm and they generate the whole of Rm . Finally, it + is obvious that span Sm Sm , because all matrices in Sm are symmetric. On the other + + hand, as is well known, every symmetric m m matrix S can be decomposed as
m
S=
i=1
i bi bT , i
where each i is a (real) eigenvalue of S and bi a corresponding (nonzero) eigenvector. Since the matrices bi bT are symmetric and positive semidenite this proves that span Sm Sm . + i
10.1.2
Linear cone
We start by considering the case where K = Rm , for some m 0. Since K = K, + by Exercise 9.7, the self-dual embedding problem is given by (SP ) min q T z : s = M z + q, s 0, z 0 . 91
We choose e to be the all-one vector of length m and q accordingly, so e = (1; . . . ; 1),

m
q = e M e.
(10.5)
times
Lemma 10.3. The sum of the coordinates of the vector e, as dened above, equals the -value of the SCB for Rm . + Proof. The sum of the elements in e equals m. From Table 9.1 it is clear that this 2 is precisely the -value of the SCB of Rm . +
The barrier function for (SP ) is given by (z, s) = qT z log si . log zi i=1 i=1
n n
Denoting x = [s; z] we may write g (x) = (x) = H(x) = 2 (x) = 0

1
s1 z 1 0 diag z 2
; .
diag s2 0
The Newton step x = [s; z] follows from 1 min (x) + xT g (x) + xT H(x)x : Ax = 0 , 2 where A = I M . This gives rise to the system, for some vector u of Lagrange multipliers, H(x)x + g (x) = AT u, Ax = 0. (10.6) Substitution, while denoting S = diag s and Z = diag z, gives S 2 0 0 Z 2 s z + 0
1
s1 z 1
I MT
u,
M z s = 0.
Since s1 = S 1 e, z 1 = Z 1 e and M T = M this is equivalent to S 2 s S 1 e = u, Z 2 z + 1 q Z 1 e = M u, s = M z. By elimination of u we obtain Z 2 z + 1 q Z 1 e = M S 2 s S 1 e , Z 2 z + 1 q Z 1 e = M S 2 M z S 1 e , 92 s = M z.
We can also eliminate s form the rst equation, which gives s = M z.
This can be rewritten as Z 2 + M S 2 M T z = Z 1 e M S 1 e + 1 q , s = M z. (10.7)
Note that we can solve z from the rst equation and since the matrix Z 2 + M S 2 M T is positive denite the solution is unique. Then we may obtain s from s = M z. Recall that the minimizer of (z, s) is the point on the central path corresponding to . Obviously, x = [s; z] is the (unique!) minimizer if and only if z = 0, which happens if and only if Z 1 e M S 1 e + 1 q = 0, Eliminating q from the rst equation we get Z 1 e M S 1 e + 1 (s M z) = 0, which is equivalent to Z 1 e 1 s = M S 1 e 1 z , s = M z + q. s = M z + q, s = M z + q.
Taking inner products with S 1 e 1 z at both sides in the rst equation we obtain T S 1 e 1 z Z 1 e 1 s = 0, which is equivalent to e 1 Sz
T
S 1 Z 1 e 1 Zs = 0.
Since S 1 Z 1 is a positive denite (diagonal) matrix, and Sz = Zs, this implies e 1 Zs = 0. Hence the -center is (uniquely!) determined by the equations zs = e, s = M z + q, s, z 0, (10.8) where zs denotes the componentwise product (also known as the Hadamard product) of the vectors z and s. Surprisingly enough, the above system is satised if = 1 and z = s = e. This implies that (e; e) is the -center of (SP ) for = 1. Hence, 1 (e; e) = 0, and so we can start the full-Newton step algorithm of Figure 7.1 with x0 = (e; e) and 0 = 1. For future use we conclude this section with the following lemma, which provides an alternative proof of (10.8). Lemma 10.4. Let s, z int Rm and > 0. Then + s1 1 z Equality holds if and only if zi si = , i {1, . . . , m} . 93 (10.9)
T
z 1 1 s 0.
Proof. We may write s1 1 z

T
z 1 1 s = s1 z 1
2m z T s + 2 =
i=1
1 zi si
zi si
0.
This proves the inequality in the lemma. Obviously, equality holds if and only if (10.9) holds, and hence the lemma follows. 2
Example of embedding problem Consider the case where the problems (P ) and (D) are determined by A= The matrix M is given by 1 0 , b= 1 1 , c= 2 .
and the vector r by
0 0 1 1 0 A b 0 0 0 1 M = AT 0 c = 1 0 0 2 T T b c 0 1 1 2 0 1 0 1 1 r = e Me = 1 1 1 2 0 0 1 1 0 0 0 1 1 0 0 2 1 1 2 0 1 0 0 3 1 0 0 3 0 1 0 = . 0 3 0 0 0 0 5
Thus, we obtain
Hence, the self-dual problem (SP ) gets the form 0 z1 0 0 1 1 1 0 0 0 1 0 z2 0 min 5 : 1 0 0 2 0 z3 + 0 0, 1 1 2 0 3 z4 0 5 z5 1 0 0 3 0 94
M =
q=
Note that the all-one vector is feasible for this problem and that its surplus vector also is the all-one vector.
z1 z 2 z3 0 . z4 = z5 =
1.6 1.4
s
1.6 1.4
z1
1.2 1
1.2 1 0.8
s2 T
s5
0.8 0.6 0.4
s z3 z2
0.6
s3
0.4 0.2
s1
s s4
0.2
0 0
20
E iteration number
40
60
0 0
20
E iteration number
40
60
Figure 10.1. Output full-Newton step algorithm for the sample problem. Full-Newton step algorithm applied to the sample problem The actual values of the output of the algorithm are z = ( 1.5999; 0.0002; 0.8000; 0.8000; 0.0002 )
s(z) = ( 0.0001; 0.8000; 0.0002; 0.0002; 1.0000 ). The left plot in Figure 10.1 shows how the coordinates of the vector z := (z1 ; z2 ; z3 ; z4 = ; z5 = ), which contains the variables in the problem, develop in the course of the algorithm. The right plot does the same for the coordinates of the surplus vector s(z) := (s1 ; s2 ; s3 ; s4 ; s5 ). Observe that z and s(z) converge smoothly to the limit point of the central path of the sample problem.
10.1.3
Second-order cone
Next we consider the case where K = Lm for some m 2. Since K = K, by + Exercise 9.8, the self-dual embedding problem is given by (SP ) min q T z : s = M z + q, s Lm , z Lm . + +
In this case we choose e and q as follows. e = ( 0; . . . ; 0 ; 2 ),

m1
q = e M e.
(10.10)
times
95
The barrier function is now given by qT z (z, s) = log s2 m

m1 m1
s2 i
i=1
log
2 zm
2 zi i=1
We need to compute the gradient and Hessian matrix of this function. Let us do this rst for the term
m1 2 f (z) := log zm 2 zi i=1
It is convenient to dene the matrix Qm as the diagonal matrix with Qmm = 1,

2 Then we have zm m1 2 i=1 zi
Qii = 1, i = 1, . . . , m 1.
(10.11)
= z T Qm z, and hence z1 . . .
f (z) =
2 2Qm z = T TQ z z m z Qm z
Furthermore, if i < m then
zm1 zm
ij zj 2zi zj + T = , TQ z T Q z)2 zi z m z Qm z (z m where ij is the Kronecker delta, and mj zj 2zm zj + T = , TQ z T Q z)2 zm z m z Qm z (z m Thus we obtain 2 (z T Qm z) zm1 z1 zm z1 4 4Qm z(Qm z)T (z T Qm z)
2
j = 1, . . . , m,
j = 1, . . . , m.
2 f (z) =
2 z1 . . .
. . . z1 zm1 z1 zm . . .. . . . . . 2 ... zm1 zm1 zm 2 . . . zm zm1 zm
The expression for 2 f (z) can be simplied to 2 f (z) =
2 Qm . T z Qm z
4Qm zz T Qm 2Qm 2Qm = . T TQ z T Q z)2 z m z Qm z (z m
Yet we are ready to write down the gradient and Hessian of (z, s). To simplify notation we introduce the following notations. g(z) = 2Qm z z T Qm z (10.12)
96
and h(z) =
4Qm zz T Qm (z T Qm z)
2
2Qm . TQ z z m
(10.13)
Exercise 10.2. Prove that if z int Lm then h(z) is positive denite.

Solution: Let u Rm . Then uT h(z)u =
2
2 uT Qm z uT Qm uz T Qm z 2uT Qm u 4uT Qm zz T Qm u T . =2 2 z Qm z (z T Qm z) (z T Qm z)2
Hence we need to show that 2 uT Qm z

2
uT Qm uz T Qm z 0,
u Rm , z int Lm , +
and that equality holds if and only if u = 0. Since z int Lm we have z T Qm z > 0, and + hence the inequality certainly holds if uT Qm u < 0. So we may assume that uT Qm u 0, which means that u Lm . To proceed we introduce the following notation. + u := u1:m1 , z := z1:m1 . (10.14)
Due to the denition of Qm the inequality that we need to prove can be written as 2 zm um uT z Since u, z Lm we have + inequality we obtain
2
u2 u m u
2 2 zm
(10.15)
z zm and
um . Using also the Cauchy-Schwarz z 0.

2 2 zm
zm um uT z zm um u Hence, it suces to show that 2 (zm um u z )2 u2 u m
and that equality holds if and only if u = 0. The last inequality can be simplied to (zm um u which is equivalent to (zm um u z )2 + (um z zm u )2 0. z )2 2zm um u z u2 z m
2 2 zm u 2
This proves the inequality (10.15). It remains to show that equality holds only if u = 0. If equality holds then we obviously have zm um = u z . Suppose u = 0. Then u Lm + implies um > 0. We also have zm > z 0, since z int Lm . Now zm um = u z implies u z > 0, whence u > 0 and z > 0. But zm um = u z and zm > z > 0 2 imply um < u , contradicting u Lm . Hence the proof is complete. +
Exercise 10.3. One has h(z) = Im if and only if z = e. Prove this. 97
Solution: Recall from (10.13) that the Hessian at z is given by h(z) = 4Qzz T Q 2Q T . z Qz (z T Qz)2
Since Q2 = Im , multiplying both sides from the left and the right with Q yields that h(z) = Im holds if and only if 4zz T 2Q T = Im . z Qz (z T Qz)2 Putting = z T Qz, this can be written as 4zz T = 2 Im + 2Q. The right-hand side is a diagonal matrix whose rst m1 entries equal 2 2 whereas the last entry equals 2 + 2. The left-hand side is a diagonal matrix only if z has one nonzero entry. Since z Lm , the nonzero entry must be zm . Thus it follows that 2 2 = 0 and + 2 2 + 2 = 4zm , which gives = 2, whence z = e. 2
Denoting x = [s; z] we now may write g (x) = (x) = 0

1
g(s) g(z)
H(x) = 2 (x) =
h(s) 0
0 h(z)
According to (10.6), the Newton step x = [s; z] follows from H(x)x + g (x) = AT u, Substitution gives h(s) 0 0 h(z) s z + 0
1
Ax = 0,
A = I M .
g(s) g(z)
MT
u,
M z s = 0.
Since M T = M this is equivalent to h(s)s + g(s) = u, Eliminating u, we obtain h(z)z + 1 q + g(z) = M (h(s)s + g(s)) , s = M z. h(z)z + 1 q + g(z) = M u, s = M z.
We can also eliminate s from the rst equation, which gives h(z)z + 1 q + g(z) = M (h(s)M z + g(s)) , and this can be rewritten as h(z) + M h(s)M T z = g(z) + M g(s) 1 q, 98 s = M z, s = M z,
where we used again that M T = M . The matrices h(z) and h(s) are positive denite, and hence so is h(z) + M h(s)M T . Therefore, we can solve z from the rst equation and this solution is unique. After this we obtain s from s = M z. Recall that the minimizer of (z, s) is the point on the central path corresponding to . Obviously, x = [s; z] is the (unique!) minimizer if and only if z = 0, which happens if and only if g(z) M g(s) + 1 q = 0, Eliminating q from the rst equation we get g(z) M g(s) + 1 (s M z) = 0, which is equivalent to g(z) + 1 s = M g(s) + 1 z , s = M z + q, s = M z + q. s = M z + q. (10.16)
Taking inner products with g(s) + 1 z at both sides in the rst equation we obtain g(s) + 1 z which is equivalent to 2Qm s 1 z sT Qm s
T T
g(z) + 1 s = 0,
(10.17)
2Qm z 1 s z T Qm z
= 0.
(10.18)
To proceed we need the following lemma, where we use the following notation. s := s1:m1 , Lemma 10.5. Let s, z Lm . Then + zT s
2
z := z1:m1 .
(10.19)
sT Qm s
z T Qm z = s2 s m
2 zm z
(10.20)
Equality holds if and only if one of the following two properties is satised. sm z = zm s and sT z = s z . (10.21) (10.22)
zm s + sm z = 0.
Proof. The equality in (10.20) is due to the denition of Qm . Using (10.19) the inequality in (10.20) can be written as zm sm + sT z
2
s2 s m
2 zm z
(10.23)
Since s, z Lm we have z zm and s sm . Using also the Cauchy-Schwarz + inequality this implies zm sm + sT z zm sm s 99 z 0.
Hence, it suces if (zm sm s This can be simplied to 2zm sm s which is equivalent to (sm z zm s ) 0. This proves (10.20), and also that this inequality holds with equality if and only if (10.21) holds. It remains to show that (10.21) and (10.22) are equivalent. The second relation in (10.21) implies that the vectors s and z are linearly dependent and sT z 0. If these vectors are both zero then (10.22) holds. Therefore, let us consider the case where at least one of these two vectors is nonzero. Without loss of generality we may assume that z = 0. Then we have s = for some 0. z Hence s = z . Also using sm z = zm s we obtain sm z = zm z , whence sm = zm . So we have zm s + sm z = zm z + zm z = 0, which proves (10.22). On the other hand, if (10.22) holds it is immediately clear that sm z = zm s . Using this we may write
2 zm sm sT z = (zm s)T sm z = (zm s)T (zm s) = zm s 2 2
z ) s2 s m
2 zm z
z s2 z m
2 zm s
= zm sm s
z .
After dividing both sides by zm sm this gives sT z = s z . Thus we have shown the equivalence of (10.22) and (10.21), and hence the proof is complete. 2
Lemma 10.6. Let s, z int Lm and > 0. Then + 2Qm s 1 z sT Qm s Equality holds if and only if zm s + sm z = 0 and z T s = 2. (10.24)
T
2Qm z 1 s z T Qm z
0.
Proof. Recall that s, z int Lm means that sT Qm s > 0, and z T Qm z > 0. This + 100
implies sm > 0, zm > 0 and by Lemma 10.5, also z T s > 0. We may write 2Qm s 1 z sT Qm s = =
T
2Qm z 1 s z T Qm z 2 sT Qm s z T Qm z + sT Qm s z T Qm z + 1 T z s 2
4 1 4s z + 2 zT s sT Qm sz T Qm z 4 1 4 T + 2 zT s z s 2 zT s 2 0, = zT s where the rst inequality is due to (10.20). By Lemma 10.5 this inequality holds with equality if and only zm s + sm z = 0. Since the second inequality holds with equality if and only if z T s = 2, this proves the lemma. 2
4sT Qm Qm z T Q sz T Q z s m m T
Again we are surprised by the fact that the above system is satised if = 1 and z = s = e, as one easily veries. This implies that (e; e) is the -center of (SP ) for = 1. Hence, 1 (e; e) = 0, and so we can start the full-Newton step algorithm of Figure 7.1 with x0 = (e; e) and 0 = 1. For ease of notation we recall that if x, y Rm then the Jordan x y product of x and y is dened by ym x + xm y xy = (10.26) xT y
Yet we return to the equation (10.18). Application of Lemma 10.6 yields that the -center of (SP ) is the unique solution of the following system of equations. 0m1 zm s + sm z , s = M z + q, z Lm , s Lm . = (10.25) + + T 2 z s
It is well-known that the second order cone consists of the squares in the Jordan algebra sense. We will not use this interesting fact, and leave its proof as an exercise to the reader. Exercise 10.4. Prove that if s Rm then s Lm if and only if s = x x for some + x Rm .
Solution: Let s = x x for some x Rm . Then we have s = 2xm x; x
101
We need to show that s sm , which is equivalent to the obvious inequality 2 |xm | x x2 + x 2 . On the other hand, if s Lm then s sm and we need to nd a vector + m x Rm such that s = 2xm x, sm = x2 + x 2 . m This implies x2 = sm m whence This gives s 2 4x2 m
2
4x4 4x2 sm + s m m
= 0.
2
Note that right-hand side is nonnegative, since s 2x2 = sm m
2x2 sm m
= s2 s m
Lm . +
Thus we obtain
s2 s 2 . m
If s2 s 2 > 0 this gives at least one positive value for xm . Taking x = s/(2xm ) it m follows that s = x x, as desired. If s2 s 2 = 0 we take xm = sm /2. If xm = 0, we m have s = 0 and hence s = 0. Then x = 0 yields s = x x. Otherwise we proceed as before and take x = s/(2xm ), whence again s = x x. 2
Using the Jordan product, one may easily verify that (10.25) can be restated as z s = e e, s = M z + q, z K, s K,
(10.27)
where e is as dened in (10.10). This makes clear once more that z = s = e satises this equation if = 1. Another interesting fact is the following relation between the interior element e of the second order cone and the -value of its barrier function. Lemma 10.7. The sum of the elements in e e, with e as dened in (10.10), equals the -value of the SCB for Lm . + Proof. One has e e = (0m1 ; 2), so the sum of the elements in e e equals 2. From Table 9.1 it is clear that this is precisely the -value of the SCB of Lm . 2 +
10.1.4
Semidenite cone
We proceed by considering the case where K = Sm for some m 2. Since K = K, + by Exercise 9.9, the self-dual embedding problem is given by (SP ) min q T z : s = M z + q, s Sm , z Sm . + +
Some words about the notation are necessary at this place. Recall from page 34 that the cone Sm consists of all positive semidenite matrices. By associating to + 102
every m m symmetric matrix X the concatenation of its columns, in their natural order, we get a vector of length m2 . We denote this vector by the small letter x. Obviously the mapping X x is linear and one-to-one. For the moment we explore this 1-1 correspondence and we consider x and X as representing the same object. This allows us to use the same vector notation as before, and simplies the presentation. The mapping X x is denoted as vec () and its inverse mapping as mat () . So we may write x = vec (X) , X = mat (x) . Using the above conventions we choose e and q as follows. e = vec (E) , E = Im , q = e M e Sm . + (10.28)
Note that in general the vector e M e does not necessarily belong to Sm . + Lemma 10.8. The sum of the elements in e, as dened by (10.28), equals the -value of the SCB for Sm . + Proof. The sum of the elements in e equals m. From Table 9.1 it is clear that this is precisely the -value of the SCB of Rm . 2 +
Observe that the rst constraint in (SP ) requires that mat (M z + q) Sm . + When denoting the i-th column as Mi this means that zi mat (Mi ) + mat (q) Sm . +
Due to (10.28) we have mat (q) Sm . We will further assume in this section + that mat (Mi ) is a symmetric matrix, for each i. Since span Sm = Sm , this is in + agreement with Assumption 10.2. When S, Z are any two m m symmetric matrix, and s = vec (S) , z = vec (Z) , then 2
m m m n n
sT z =
sk zk =
Sij Zij =
k=1
i,j=1
i=1
(10.29) This shows that Tr (SZ) is the natural inner product of two symmetric matrices S and Z. It is (therefore) also represented as S, Z . The corresponding norm is the well-known Frobenius norm, which satises S, S = sT s = s . S F = 103
j=1
Sij Zij =
(XY )ii = Tr (XY ) .
i=1
Obviously, the trace function is commutative, i.e., Tr (ZS) = Tr (SZ). The barrier function for (SP ) can now be written as (z, s) = qT z log det S log det Z.
Recall that in the cases of linear and conic quadratic optimization we could nd a nice characterization of the central path, as provided by (10.8) and (10.27), respectively. Our aim is to nd a similar result in the present case and to show that (e; e) is the point on the central path of (SP ) for = 1. Then, denoting x = [s; z], just as in the previous cases, we may write 0 g(s) h(s) + , H(x) = 2 (x) = g (x) = (x) = 1 q g(z) 0 Let g(s) and h(s) denote the gradient and Hessian of log det S, respectively.
0 h(z)
As before we denote the Newton step as x = [s; z]. At this stage an important remark is in order. Obviously the Newton step must be such that the new iterate is feasible for (SP ) again. This is the case only if mat (s) and mat (z) are symmetric matrices. Therefore we have to add constraints to the minimization problem 1 min (x) + xT g (x) + xT H(x)x : Ax = 0 , 2
with A = I M , so as to guarantee that mat (s) and mat (z) are symmetric, because otherwise this natural condition will not be satised. Note that the constraint Ax = 0 means that s = M z. At this stage we invoke Assumption 10.2. According to this assumption the columns of M correspond to symmetric matrices. As a consequence also mat (M z) is a symmetric matrix. Hence s will automatically be symmetric. Unfortunately this is not true for z. So the question is now how to model the condition that mat (z) must be symmetric. Clearly this requires that for all i and all j = i the (i, j)-entry and the (j, i)-entry in mat (z) are equal. This is a linear condition. Hence there exists an m2 m2 matrix Tm such that mat (z) is symmetric if and only if Tm z = 0. For example, for m = 2 and m = 3 we may take 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 T2 = 0 1 1 0 , T3 = 2 0 0 0 0 0 0 0 0 0 . 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 104
Note that the symmetry condition only concerns the o-diagonal entries in mat (z) : there are no conditions on the diagonal elements. This is expressed by the fact that the corresponding rows in Tm are zero.
The matrix Tm is not unique. When multiplying all entries in the same row with a positive scalar we still have that mat (z) is symmetric if and only if Tm z = 0. So, without loss of generality we may take all nonzero diagonal elements equal to 2 1/2, just as in the above two examples. Then Tm is symmetric and Tm = Tm . The proof is left as an exercise. Exercise 10.5. Verify that we may take all nonzero diagonal elements of Tm equal 2 to 1/2 and that the matrix Tm is then symmetric. Also show that Tm = Tm .
Solution: Let Z be a arbitrary m m matrix. We want the matrix Tm to be such that Z is symmetric if and only Tm z = 0, where z = vec (Z) . Z is symmetric if and only if the 1 mapping Z 2 Z Z T maps Z to zero. Applying this mapping twice to Z we obtain
1 2
1 Z ZT 2
1 Z ZT 2
1 Z ZT , 2
showing that the mapping is idempotent. It remains to nd a matrix representation of this mapping when acting on z = vec (Z) . By the denition of the vec () operator we have z = vec (Z) if and only if z(j1)m+i = Zij , Hence we have
(Tm z)(j1)m+i = 1 2
i, j {1, . . . , m} .
z(j1)m+i z(i1)m+j ,
i, j {1, . . . , m} ,
whence

Tm

(j1)m+i,(j1)m+i
= = =
1 , 2
i = j, i = j, otherwise. 2 (10.30)
Tm (j1)m+i,(i1)m+j Tm (j1)m+i,k
1, 2 0,
This proves the result in the exercise.
Exercise 10.6. Let z = vec (Z) , where Z is any m m matrix. Prove that Z is skew-symmetric if and only if Tm z = z.
Solution: Recall that any matrix Z can be written as the sum of a symmetric and a skew-symmetric matrix, as follows: Z= 1 Z + ZT 2 + 1 Z ZT , 2
105
and that this decomposition is unique. We may write

z = Tm z + I Tm z. Since Tm (I Tm ) z = 0, the matrix mat ((I Tm ) z) is symmetric. On the other hand, since Tm (Tm s) = Tm z, the symmetric part of the matrix mat (Tm z) is zero, and hence mat (Tm z) is skew-symmetric. Hence we may conclude that
mat
I Tm z =
1 Z + ZT , 2
1 2
mat Tm s =
1 Z ZT . 2 2
Since Z is skew-symmetric if and only Z =
Z Z T , the statement follows.
Instead of the matrix Tm we use below the matrix Tm that arises from 2 Tm by deleting all zero rows and the rows with the negative nonzero entry left of the diagonal. So we have 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 . 0 1 1 0 , T3 = T2 = 2 2 0 0 0 0 0 1 0 1 0 One may easily verify that the size of Tm is 1 m(m 1) m2 . 2
Exercise 10.7. We still have that Z = vec (z) is symmetric if and only Tm z = 0. T T Prove this, and also that Tm is a right inverse of Tm (i.e., Tm Tm is the identity T matrix) and Tm Tm = Tm .
Solution: Any row of Tm is either equal to a row of Tm or to minus a row of Tm , and Tm has no other nonzero rows. Hence the row spaces of Tm and Tm are equal. Hence Tm z = 0 if and only if Tm z = 0, which implies the rst statement. For each pair (i, j), with i > j {1, . . . , m} the matrix contains a row, and the corresponding row, which we label by the pair (i, j), is given by
(Tm )(i,j),(j1)m+i = (Tm )(i,j),(i1)m+j = (Tm ):,(j1)m+i =
1 , 2 1 2 ,
i > j, i > j, otherwise. (10.31)
0,
Obviously, the rows of Tm are orthogonal and their Euclidean length is 1. Hence it follows T T that Tm Tm is the identity matrix. So it remains to show that Tm Tm = Tm . Let (i, j) and (u, v) be arbitrary pairs of indices from the the index set {1, . . . , m}. Then the T ((i, j), (u, v))-entry of Tm Tm is given by (Tm )k,(j1)m+i) (Tm )k,(v1)m+u .
k
Due to (10.31) a term in the above sum is nonzero only if (u, v) = (i, j) or (u, v) = (j, i), with i = j, and in both cases there is precisely one index k for which his happens. In the
106
rst case, when (u, v) = (i, j), the term equals 1/2 and in the second case the term equals 1/2. Thus we obtain
T Tm Tm T Tm Tm (j1)m+i,(j1)m+i
1 , 2
i = j, i = j, otherwise. 2
(j1)m+i,(i1)m+j T Tm Tm
= 1, 2 = 0,
:,
T Comparing this with (10.30) we see that Tm Tm = Tm .
Exercise 10.8. Let u be any vector of length skew-symmetric.
1 2 m(m
T 1). Prove that Tm u is
T Solution: When denoting w = Tm u, the (i, j)-entry of w is given by
w(j1)m+i =
p>q
T Tm
(j1)m+i,(p,q)
u(p,q)
=
p>q
(Tm )(p,q),(j1)m+i u(p,q)
Due to (10.31), the entry of Tm that appears under the last sum is nonzero only if i = p and j = q or j = p and i = q. Hence, since p > q it follows that w(j1)m+i = w(j1)m+i =
1 2 1 2
u(i,j) , u(i,j) ,
i > j, i < j. 2
This shows that mat (w) is skew-symmetric.
It may now be clear that in the present case the Newton step should be dened as the optimal solution of the minimization problem 1 min (x) + xT g (x) + xT H(x)x : Ax = 0, Tm z = 0 , 2 where, as before, x = [s; z]. Since A = problem require that I M s 0 Tm z I M , the constraints in this = 0.
Using vectors us and uz as Lagrange multipliers, the optimality conditions are H(x)x + g (x) = I M 0
T
us uz 107
Tm
I M 0
s z
Tm
= 0.
In other words,
h(s) 0 s z + 0
1
0 h(z)
g(s) g(z)
0
T Tm
us uz
I M
s z
0 Tm
= 0.
Thus we have the following system of linear equations in the variables s, z, us , and us : g(s) s h(s) 0 I 0 1 T 0 h(z) M Tm z = q + g(z) . (10.32) u I 0 MT 0 0 s 0 uz 0 Tm 0 0 Despite the fact that we do not have explicit expressions for h(s) and h(z), we know that the matrices h(z) and h(s) are positive denite, since these are the Hessians of self-concordant functions. Having this said, we invite the reader to convince himself (or herself) that the above system determines s and z uniquely. One possible way for doing this is to make the following exercise. Exercise 10.9. Prove that the system (10.32) has a unique solution.
Solution: We use matrices H and U dened by H := h(s) 0 0 h(z) , U= I MT 0 Tm
and the vectors x, u, and r dened as follows: x := s z , u := us uz , r := g(s)

1
q + g(z)
Then the system (10.32) can be written as H UT U In other words, Hx + U T u = r U x = 0. The rst equation implies x = H 1 U T u + r . Substitution into the second equation yields U H
1
x u
r 0
(10.33)
U T u + r = 0, or (10.34)
U H 1 U T u = U H 1 r.
108
The right-hand side is known, and the matrix U H 1 U T is nonsingular. The latter can be shown as follows. If not, then there exists a nonzero vector w such that U H 1 U T w = 0. This implies wT U H 1 U T w = 0. Since H is positive denite, H 2 exists. Thus it follows that 2 T 1 1 1 wT U H 1 U T w = H 2 U T w H 2 U T w = H 2 U T w = 0.
2
1
is nonsingular this implies U w = 0. Since U has Hence H U w = 0. Since H full row rank the columns of U T are linearly independent, so it follows that w = 0. This contradiction proves that U H 1 U T is nonsingular. So we can solve u = (us ; uz ) from (10.34), and the solution is unique. Now x follows from (10.33), again uniquely. 2
1 2
1 2
To proceed we need to compute the gradients g(s) and g(z). For that purpose we use the following lemma.5 Lemma 10.9. Let S int Sm . Then the gradient of f (S) := log det S is given by + S 1 . Proof. Let H Sm be such that S + H int Sm . Using well-known properties of + the determinant we write f (S + H) f (S) = log det (S + H) log det S = log det (S + H) + log det S 1 = log det S 1 det (S + H) = log det I + S 2 HS 2 .
1 1 1 1
Since S +H int Sm , the matrix I +S 2 HS 2 is positive denite. Let 1 , . . . , m + 1 1 denote the eigenvalues of the matrix S 2 HS 2 . Then the eigenvalues of I + 1 1 S 2 HS 2 are 1 + 1 , . . . , 1 + m and these are positive. Since the determinant of a symmetric matrix equals the product of its eigenvalues, we may write
m m
log det I + S
1 2
HS
1 2
= log
i=1
(1 + i ) =
i=1
log (1 + i ) .
Using log(1 + t) t for t > 1 and that the trace of a symmetric matrix equals the sum of its eigenvalues we arrive at
m
f (S + H) f (S)
i = Tr S 2 HS 2
i=1
= Tr S 1 H = S 1 , H .
This shows that S 1 is a subgradient of f at S. Since f is dierentiable, the subgradient is unique, and hence S 1 represents the gradient of f at S. 2
5 The
given proof is due to E. de Klerk [2, Lemma A.1.1].
109
Corollary 10.10. One has g(s) = vec S 1
and g(z) = vec Z 1 .
Remark: When S is a symmetric m m matrix like in the previous lemma, then 1 f (S) = log det S is a function of 2 m(m + 1) variables. Hence one would expect the gradient of f (S), to be a vector of length 1 m(m + 1). One might ask how the gradient 2 can be given by S 1 , whereas vec S 1 is a vector of length m2 . We illustrate this phenomenon by a simple example and show how this question can be resolved. Let m = 2. Then det S = s11 s22 s2 , and hence, by taking partial derivatives of f with respect to 12 s11 , s12 and s22 , respectively, we obtain

s22 f (S) = 1 det S 2s12 s11 Now let us perturb S slightly by a symmetric matrix H. Due to the denition of the gradient the rst order approximation for f (S + H) at S satises

h22 f (S + H) f (S) f (S), h12 h11 But this gives exactly the same value as
T
1 (s22 h22 2s12 h12 + s11 h11 ) . det S
s22 S 1 , H = vec S 1
T
h22 h12 h12 h11 .
vec (H) =
1 det S
s12 s12 s11
So we have
A similar argument works for larger values of m, as one easily may verify.
f (S + H) f (S) S 1 , H .
Yet we draw our attention to the question how we can recognize that s and z are the -centers of (SP ). Since the -centers of (SP ) minimize (z, s), the solution at the -centers of (10.32) will have z = 0, and hence also s = 0. Since the solution of (10.32) is unique, this happens if and only if there exist (unique) vectors us and uz such that I M 0
T Tm
us uz
g(s)
1
q + g(z)
(10.35)
Lemma 10.11. If s and z are the -centers of (SP ) then we have g(s) + 1 z
T
g(z) + 1 s = 0. 110
(10.36)
Proof. According to (10.35) we have us = g(s)

T M us = 1 q g(z) + Tm uz .
(10.37) (10.38)
By eliminating us we obtain
T M g(s) = 1 q + g(z) Tm uz .
As we established before, Assumption 10.2 implies that the matrices corresponding to the columns of M are symmetric. Hence we have Tm M = 0. We also know that Tm q = 0 and Tm g(z) = 0 (due to Corollary 10.10). Hence, when multiplying the T above equation from the left with Tm , since Tm Tm is the identity matrix we obtain uz = 0. Thus we arrive at M g(s) = 1 q + g(z). Replacing q by s M z this implies or, equivalently, g(z) + 1 s = M g(s) + 1 z . Since M is skew-symmetric, this implies (10.36), which is the desired result. 2 M g(s) = 1 (s M z) + g(z),
By using Corollary 10.10 we now can restate (10.36) as vec S 1 1 z and by (10.29) this is equivalent to S 1 1 Z, Z 1 1 S = 0. Lemma 10.12. Let S, Z int Sm and > 0. Then + Equality holds if and only if SZ = Im . Proof. We may write S 1 1 Z, Z 1 1 S = S 1 , Z 1 1 Z, Z 1 1 S 1 , S + 2 Z, S 1 2m + 2 Z, S = S 1 , Z 1 2m 1 1 1 = Tr S Z + 2 Tr (ZS) 1 1 1 2 1 1 = Tr S 2 Z 1 S 2 Im + 2 S 2 ZS 2 . 111 (10.39)
T
vec Z 1 1 s = 0,
S 1 1 Z, Z 1 1 S 0.
Dening D = S 2 ZS 2 , it is clear that D is symmetric and positive denite, and 1 1 moreover, S 2 Z 1 S 2 = D1 . We thus may proceed as follows. S 1 1 Z, Z 1 1 S = Tr D1 = Tr
1 1
2 1 Im + 2 D 1 1 D2
2 F T
D 2 1 1 D2
D 2
1 1 D2
= D 2
0.
This proves the inequality in the lemma, and also that equality holds if and only 1 1 if D 2 D 2 = 0. The last identity is equivalent to D = Im , which means 1 1 1 S 2 ZS 2 = Im . Multiplying both sides from the left and the right with S 2 we get Z = S 1 , which means that ZS = SZ = Im . This proves the lemma. 2
Yet we return to the equation (10.36). As a consequence of Lemma 10.12 we conclude that the -center of (SP ) is the (unique) solution of the following system of equations. ZS = Im , s = M z + q, z Sm , s Sm . (10.40) + + As in the previous two cases, the above system is satised if = 1 and z = s = e. This implies that (e; e) is the -center of (SP ) for = 1, and hence, 1 (e; e) = 0. So we can start the full-Newton step algorithm of Figure 7.1 with x0 = (e; e) and 0 = 1. The next exercise serves to obtain an explicit expression for the Hessian matrix. It uses the (usual) Kronecker product of matrices (see, e.g., [6]). Exercise 10.10. Prove that the Hessian matrix h(z) is given by h(z) = Z 1 Z 1 . Also show that h(z) is positive denite.
Solution: Due to Corollary 10.10 we know that g(z) = Z 1 . Applying the product rule to Z 1 Z = Im we obtain Z Z 1 Z 1 Z + Z 1 = Z + Z 1 Eij = 0, Zij Zij Zij where Eij denotes the m m matrix with 1 in the position (i, j) and zeros elsewhere. Hence we obtain Z 1 = Z 1 Eij Z 1 , Zij
112
whence
g(z) = Z 1 Eij Z 1 . Zij
Note that h(z) is an m2 m2 matrix, whose rows and columns are indexed by the pairs (i, j). The column in this matrix indexed by the pair (i, j) is equal to vec Z 1 Eij Z 1 . Putting T = Z 1 , the ((p, q), (i, j))-entry in h(z) is therefore given by (T Eij T )pq =
u,v
Tpu (Eij )uv Tvq = Tpi Tjq = Tip Tjq = (T Epq T )ij ,
(10.41)
showing also that h(z) is symmetric, as it should. We proceed by establishing that h(z) is positive denite. Let U and V be two (not necessarily symmetric) matrices of size m m, and u = vec (U ) , v = vec (V ) . Then, using that T is symmetric we may write uT h(z)v =
(p,q) (i,j)
Upq Tpi Tjq Vij

T Uqp Tpi i,q p j
= =
i,q
Vij Tjq
UT T
qi
(V T )iq
=
q
UT T V T
qq
= Tr U T T V T . Hence, taking v = u, it follows that uT h(z)u = Tr U T T U T Moreover, since T = Z

1
= Tr
T 2 UT 2
T
T 2 UT 2
= T 2 U T 2 0.
F
is positive denite, u h(z)u equals zero only if U = 0.
It remains to show that h(z) is equal to the given Kronecker product. From (10.41), and since h(z) is symmetric, we know that the ((i, j), (p, q))-entry in h(z) is given by Tpi Tjq = Tip Tjq , T = Z 1 . (10.42)
By way of example, consider rst the case where m = 2. Then, using the notation T = Z 1 again, the Hessian becomes
h(z) =
T11 T11 T21 T11 T11 T21 T21 T21
T12 T11 T22 T11 T12 T21 T22 T21
T11 T12 T21 T12 T11 T22 T21 T22
T12 T12 T22 T12 T12 T22 T22 T22
T11 T T12 T T21 T T22 T
= T T.
Hence it follows from (10.42) that
This relation also holds for larger values of m, as we now show. Note that if t = vec (T ) then we have t(j1)m+i = Tij , i, j {1, . . . , m} . h(z)(j1)m+i,(q1)m+p = Tip Tjq .
113
Fixing j and q, we get h(z)(j1)m+1:(j1)m+m,(q1)m+1:(q1)m+m = T Tjq . Thus we may write h(z) = as was to be shown.
T11 T . . . T1m T . . . . . . Tm1 T . . . Tmm T
= T T, 2
Due to the result of Exercise 10.10 we are able to present an explicit expression for the Hessian matrix H(x), with x = (s; z). We have H(x) = S 1 S 1 0 1 0 Z Z 1 .
10.2
The general self-dual symmetric case

(SP ) min q T z : s = M z + q, s K, z K
In the previous sections we have shown that the conic optimization problem
can be solved eciently if the cone K is a standard cone, i.e., linear, quadratic or semidenite. In each of these cases it has become clear that (i) the cone K is self-dual; (ii) there exists a -self-concordant -barrier for the cone K, and (iii) there exist a point e int K such that (s; z) = (e; e) is the point on the central path of (SP ) with respect to for = 1. Due to (iii), we have 1 (e; e) = 0, and hence x0 = (e; e) can be used as starting point for the full-Newton step algorithm. Note that (iii) implies that q = e M e.
We now want to show in this section that the above properties can be extended to every cone that is the direct product of (a nite number of) standard cones: K = K1 . . . Kr , (10.43)
where each Ki (i = 1, . . . , r) is a standard cone. This requires some preparation.
To simplify the notation we restrict ourselves for the moment to the case r = 3 and we assume that K1 is a semidenite cone, K2 a quadratic cone, and K3 a linear cone. Letting these cones be Sn , Ln and Rn , respectively, we have + + + K = Sn Ln Rn . + + + 114
We dene n := n + n + n . The analysis below can be straightforwardly extended to the cases where r > 3. Any z K can be decomposed into its three natural components as follows: z z = z , z Sn , z Ln , z Rn . + + + z
From Exercise 9.2 it follows that K is self-dual, because each component of K is self-dual. For each component we have a SCB, according to Table 9.1. Using the notation for these SCBs that was introduced in Table 9.1, we know from Lemma 9.10 that (z) = (z ) + (z ) + (z ) n n n is a SCB for K. Lemma 9.10 also gives the complexity number of . Since = 1 for each component, the complexity number of is , where = n + 2 + n . (10.44)
As we know, each standard cone has a unit element, as given by (10.5) for the linear cone, by (10.10) for the second-order cone and by (10.28) for the semidenite cone. We denote these elements as e , e , and e , respectively, and we dene e := e ; e ; e . Then we have e int K. We assume that q is given by q := e M e. Then z = e is strictly feasible for (SP ), with s = e. Partitioning the rows of M according to the three components in K, and similarly for q, we write M q M = M , q = q . M q (10.46) (10.45)
At this stage we recall Assumption 10.2, namely that the columns of the matrix M belong to span K. Due to (10.4) this is the case if and only if Tn M = 0, where Tn is as dened in Section 10.1.4.
(10.47)
Since the columns of M and also e belong to K, equation (10.46) implies that q span K. We make the stronger assumption that q K. 115 (10.48)
The same argument, namely that the columns of M and q belong to K, yields that M u span K, u Rn . (10.49) While denoting x = (s; z), our aim in this section is to show that if we use the barrier function (x) = qT z + (s) + (z) qT z = + (s ) + (s ) + (s ) + (z ) + (z ) + (z ) n n n n n n
then z = e and s = e yield the point on the central path of (SP ) for = 1. This will imply that also in this general case we can start the full-Newton step algorithm of Figure 7.1 with x0 = (e; e) and 0 = 1, since 1 (e; e) = 0. To prove this we need to know the gradient g (x) and the Hessian H(x) of (x) at x = (s, z). As in the previous sections, let g(z) and h(z) denote the gradient and Hessian of (z), respectively. Then, just as there we may write h(s) 0 g(s) 0 , , H(x) = + g (x) = 0 h(z) g(z) 1 q where in the present case g(z) and H(z) are given by h (z ) 0 0 g (z ) g(z) = g (z ) , h(z) = 0 h (z ) 0 , 0 0 h (z ) g (z ) and similar expressions hold for g(s) and H(s).
To proceed we need to dene the Newton step x = [s; z] at a strictly feasible pair x = (s; z). To keep the iterates feasible for (SP ) we need to require that s = M z and also s span K and z span K. Due to (10.49) the condition s span K is satised automatically if s = M z. But z span K is satised if and only if Tn z = 0. Hence we dene the Newton step as the optimal solution of the minimization problem 1 min (x) + xT g (x) + xT H(x)x : Ax = 0, Tn z = 0 . 2 where, as before, A = I M . The constraints in this problem require that s In 0 0 Mn ,n Mn ,n Mn ,n s 0 In 0 Mn ,n Mn ,n Mn ,n s (10.50) z = 0. 0 0 In Mn ,n Mn ,n Mn ,n 0 0 0 Tn 0 0 z z 116
Using vectors us = u ; u ; u and uz as Lagrange multipliers, the optimality s s s conditions consist of (10.50) together with the equation In 0 0 0 0 In 0 0 0 0 In 0 Mn ,n Mn ,n Mn ,n T u s
H(x)x + g (x) =
In 0 0 In 0 0 = Mn ,n Mn ,n Mn ,n Mn ,n Mn ,n Mn ,n Mn ,n u s u s us = . T M us + Tn uz M us M us

Mn ,n Mn ,n Mn ,n u s Mn ,n Mn ,n Mn ,n u s uz Tn 0 0 0 0 0 0 u s In 0 u s T Mn ,n Tn us Mn ,n 0 uz 0
Putting all these constraints together, we get
0 h(s) 0 I 0 0 0 h(z) M
T Tn
s s s z z z u s u s u s uz
g(s)
0 0 I 0 0 M
T
1 q + g(z)
(10.51)
0 0 0 0 0
0 0 0
0 0
0 Tn 0
Comparing this with (10.32), we observe that both systems have the same structure. Just as there, and in exactly the same way, one shows that (10.51) has a unique solution. Since the minimizer of (z, s) is characterized by the fact that z = 0, it follows that the point on the central path corresponding to is characterized by the fact 117
that it satises the following system. 0 0 u I s 0 u s T Mn ,n Mn ,n Mn ,n Tn u s 0 uz Mn ,n Mn ,n Mn ,n Mn ,n Mn ,n Mn ,n 0
g (s ) g (s ) = 1 . (10.52) q + g (z ) 1 q + g (z ) 1 q + g (z )
g (s )
Lemma 10.13. For any solution of (10.52) one has uz = 0. Proof. The fourth equation resulting from (10.52) reads as follows:
T Mn ,n u + Mn ,n u + Mn ,n u = 1 q + g (z ) + Tn uz . s s s
We can use exactly the same argument as in the proof of Lemma 10.13: since Tn M = 0, the left-hand side expression vanishes if it is multiplied from the left with Tn . The same holds for q , since q Sn , and, due to Lemma 10.10 also T T for g (z ) Sn . It thus follows that Tn Tn uz = 0. Since Tn Tn is the identity + matrix, we obtain uz = 0. 2
Now that we know that uz = 0 in (10.52) it follows that the -center of (SP ) is characterized by the fact that I M us = g(s)
1
q + g(z)
(10.53)
Lemma 10.14. If s and z are the -centers of (SP ) then we have g(s) + 1 z
T
g(z) + 1 s = 0.
(10.54)
Proof. According to (10.53) we have us = g(s) M us =

1
(10.55) (10.56)
q g(z).
We can now proceed as in the proof of Lemma 10.11. By eliminating us we obtain M g(s) = 1 q + g(z). Replacing q by s M z this implies M g(s) = 1 (s M z) + g(z), 118
or, equivalently, g(z) + 1 s = M g(s) + 1 z . Since M is skew-symmetric, this implies (10.36), which is the desired result. 2
Using the notations of this section we may rewrite (10.54) as follows. g (s ) z T g (z ) s
This implies
g (s ) g (s )
+ 1 z z
T
g (z ) g (z )
+ 1 s = 0. s
g (s ) + 1 z
g (z ) + 1 s
T
+ g (s ) + 1 z
g (z ) + 1 s
T
+ g (s ) + 1 z
g (z ) + 1 s = 0.
At this stage we invoke the Lemmas 10.4, 10.6 and 10.12. These lemmas imply that each of the three terms in the left-hand side expression is nonnegative. Since their sum vanishes, we conclude that each of these terms equals zero. But then the same lemmas imply that we have S Z = e s z = e e , s z = e , (10.57)
where S = mat (s ) and Z = mat (z ) . If = 1 these equations are satised if S = Z = e , s = z = e , s = z = e ,
and hence the vector (e; e), with e as given by (10.59), represents the point on the central path of (SP ) for = 1, as desired. We trust that the above results are convincing enough to be allowed to state without further proof analogous results for cases where the cone K in (SP ) is any symmetric cone of the form (10.43): K = K1 . . . Kr . From Exercise 9.2 it follows that K is self-dual, because each Ki is self-dual. For each Ki we have a (i , i )-SCB, according to Table 9.1. Denoting these SCBs as i , we know from Lemma 9.10 that (s) = 1 (s1 ) + . . . + r (sr ) 119
is a SCB for K, where the components si of s are dened by s = s1 ; . . . ; sr , si Ki (i = 1, . . . , r).
Lemma 9.10 also gives the complexity number of . Since i = 1 for each i, the complexity number of is , where = 1 + . . . + r , (10.58)
with i denoting the -value for i . Now let ei be the unit element of Ki , as given by (10.5) for the linear cones, by (10.10) for the second-order cones and by (10.28) for the semidenite cones. Dening e := e1 ; . . . ; er , q := e M e, (10.59)
we have e int K and z = e, s = e is strictly feasible for (SP ). Then we have the following result. Theorem 10.15. If we use the barrier function (x) = qT z + (s) + (z),
then z = e and s = e yield the point on the central path of (SP ) for = 1. We conclude this section with two simple, but important lemmas. We know already that a feasible solution (s, z) of (SP ) is optimal if and only if sT z = 0. More generally we have the following result. Lemma 10.16. Let (s; z) be a feasible solution of (SP ). Then this solution is optimal if and only if T si z i = 0, i = 1, . . . , r. (10.60) Proof. We have sT z =
i=1 The terms in the sum at the right are all nonnegative, since si Ki and z i Ki . T Therefore, s z = 0 holds if and only if all these terms vanish, which proves the lemma. 2
si z i .
Lemma 10.17. Let (s; z) be the -center of (SP ) for some > 0. Then si z i = i ,
T
i = 1, . . . , r. 120
(10.61)
As a consequence we have sT z = . Proof. If Ki is a linear cone then we deduce from (10.57) that si z i equals times the sum of the coordinates in ei . This sum equals i , by Lemma 10.3, thus proving (10.61) if the cone Ki is linear. If Ki is a second-order cone then (10.57) yields that si z i = ei ei . By the denition (10.26) of the Jordan product, the last coordinate of si z i is equal to T si z i . The last coordinate of ei ei equals 2, which is precisely the -value of the SCB of the second-order cone. Hence also in that case (10.61) holds. Finally, if Ki is a semidenite cone then (10.57) yields that S i Z i = E i . Hence si z i = S i , Z i = Tr S i Z i = Tr E i . Since E i is a diagonal matrix, Tr E i equals the sum of the coordinates of the vector ei = vec E i , which by Lemma 10.8 is equal to i . Hence, again we have si z i = i . This completes the proof of the rst statement in the lemma. The second statement in the lemma is now a direct consequence of the rst statement and (10.58), because using these we obtain
r T T T
sT z =
i=1
si z i =
i=1
i =
i=1
i = . 2
Hence the proof is complete.
One might be inclined to think that Lemma 10.17, by letting going to zero, implies Lemma 10.16. This is not true, however. As becomes clear in the next section, when goes to zero in Lemma 10.17, the -center converges to a specic optimal solution, whereas Lemma 10.16 holds true for every optimal solution. Most important for our approach is that the limit point of the central path is such that it enables us to decide whether or not the original problems (CP ) and (CD) are solvable or not. Moreover, if (CP ) and (CD) are solvable then optimal solutions can be derived from this specic optimal solution of (SP ).
10.3
Back to the general symmetric case

(CP ) inf cT x : Ax b K1 , x K2 ,
We started this chapter by considering the symmetric optimization problem
and its dual problem (CD)

sup bT y : c AT y K2 , y K1 .
121
We showed that these problems can be embedded in the self-dual problem (SP ) min q T z : M z + q K, z K ,
with the skew-symmetric matrix M is as dened in (9.12) and, by (9.17), q = e Me = and 0 eT e K K , e int K int K , (10.62)
We proved in Theorem 9.9 that (CP ) and (CD) have optimal solutions with vanishing duality gap if and only if the self-dual problem (SP ) has an optimal solution with > 0. In what follows we restrict ourselves to the case where K1 and K2 are (nite) direct products of standard cones. Then it is obvious that K is a (nite) direct product of standard cones as well, and K = K. But then the results of the previous section apply, and we may conclude that we can obtain an -solution z = (y ; x ; ; ) of (SP ) in polynomial time by using the full-Newton step algorithm of Figure 7.1. In light of Theorem 9.9 it would be highly desirable that > 0 if and only if there exists an optimal solution of (SP ) with > 0. The aim of this section is to show that this desire is fullled. In order to prove this we need to further explore properties of the central path of (SP ). Just as before, we denote s(z) by s. So s = M z + q. Exercise 10.11. Show that if (s, z) is the -center for some > 0, then = .
Solution: With z = z() and s = s() we have sT z = , by Lemma 10.17, and also sT z = q T z = eT e, by the denition (9.17) of q. As eT e = , which follows from (10.1) and (10.3), we obtain = . 2
K=
K1 K2 R+ R+
z=
y x
K =
K1 K2 R+ R+
(10.63)
In the present case, according to (10.63), the cone K has the form
K = K1 K2 R+ R+ ,
and the variables and live in the last two cones in this product. With z 122
partitioned as above, we partition the vectors e and s in the same way as follows: K1 y int K1 e1 K x e int K 2 2 s= , s = K = K = 2 , R+ 1 int R+ R+ int R+ 1
where e1 and e2 are the unit element of the cones K1 = K 1 and K2 = K 2 , respectively. Exercise 10.12. Using the result of Exercise 9.6 show that if 1 then the vectors s() and z() are bounded in size.
Solution: With z = z() and s = s() we know from (9.20) that eT z + eT s = (1 + )eT e. By Exercise 10.11 we have = . Hence, since 1 and eT e = , we have eT z + eT s 2. Since e int K int K , z K and s K both eT s and eT z are nonnegative. Hence it follows from Exercise 9.3 that both s and z are bounded in size. 2
We rst concentrate on the behavior of the central path when approaches zero. This has sense because during the execution of the full-Newton step algorithm the iterates stay very close to the central path, due to the fact that (s; z) stays very 1 small (smaller than 4 ). Theorem 10.18. Let z() denote the -center of (SP ) and s() := s(z()). If (CP ) and (CD) have optimal solutions with vanishing duality gap then we have for each > 0 that 1 () . (10.64) Proof. Let (, z ) be an arbitrary optimal solution of (SP ). We partition the s relevant vectors according to y y() y () y x x() x () x z() = , s() = , z = , s = . () () () ()
Since it is supposed that (CP ) and (CD) have optimal solutions with vanishing duality gap, we may assume that = 1 (see Section 9.4). 123
Applying the orthogonality property (cf. Exercise 9.4) to the vector z() z , while using that s() s = M (z() z ), we get (z() z )T (s() s) = 0. Since z is optimal we have sT z = 0. Using also that s()T z() = we derive that z()T s + s()T z = . This means that y()T y + y ()T y + x()T x + x ()T x + () + () + () + () = . Each of the eight terms at the left is the inner product of an element in some cone and an element in its dual cone. Therefore, each of these terms is nonnegative. This implies that each of the terms does not exceed . In particular we have () . Now using that = 1 and () () = we get () () (). Since () > 0, this implies (10.64). 2
We now show that a similar property holds for the iterates generated by the full-Newton step algorithm. Theorem 10.19. Let z = (y; x; ; ) result from a Newton step in the algorithm of Figure 7.1. If (CP ) and (CD) have optimal solutions with vanishing duality gap then 8 . (10.65) 9 Proof. Applying Theorem 6.34 to the present situation, using that = 1, we obtain x x x , 1+ 1
where x = (s; z), x = (s(); z() and = (x) = (s; z). During the course of the algorithm we have after each iteration that 1 , as was established in Section 9 7.4.1. Thus we obtain (s; z) (s(); z()) This means that [(s; z) (s(); z())] H(s, z) [(s; z) (s(); z())] 124
T (s; z)
1 . 8
1 , 64
or, equivalently, s s() z z() This can be written as (s s()) h(s) (s s()) + (z z()) h(z) (z z()) Both term at the left-hand side are nonnegative, hence (z z()) h(z) (z z()) This in turn means that y x y() x() () () T h(y) 0 0 0 0 h(x) 0 0 0 0 h() 0 0 0 0 h() y x y() x() () ()
T T T T
h(s) 0 0 h(z)
s s() z z()
1 . 64
1 . 64
1 . 64
1 . 64
Using the same argument as before, we obtain that [ ()] h() [ ()] Since h() = 2 it follows that 1 whence it follows that 1 () , 8
T
1 . 64
7 () 9 . 8 8 So, by using Theorem 10.18 we may conclude that This proves the theorem. 8 8 () . 9 9 2
We thus have shown that the -solution z = (y ; x ; ; ) of (SP ) that is generated by the full-Newton step algorithm of Figure 7.1 has the property that > 0 if there exists an optimal solution of (SP ) with > 0. This makes that if (CP ) and (CD) have optimal solutions with vanishing duality gap then these can be found by the algorithm. 125
10.4
What if = 0?
So far we focussed on the case where the optimal solution of (SP ) has > 0, in which case it can be used to obtain optimal solutions of (CP ) and (CD). If it happens that the optimal solution has = 0, then we know for sure that (CP ) and (CD) do not have both optimal solutions with vanishing duality gap. For each of the problems there are several situations that may occur: 1. the problem is solvable (i.e., has an optimal solution); 2. the problem has a nite optimal value which is not attained; 3. the problem is unbounded; 4. the problem is infeasible. The question that we consider in this section is whether or not we can decide which of the situations occurs if = 0 in the optimal solution of (SP ). Recall from (9.7) and (9.16) that the constraint in (SP ) can follows. K1 0 y 0 A b e1 Ae2 + b AT 0 c e2 + AT e1 c x 0 K2 , + T b cT 0 1 bT e1 + cT e2 0 R+ eT e R+ 0 be written out as K1 y x K 2 R+ R+
where we used that K2 = K2 , and where e = (e1 ; e2 ; 1; 1), with ei denoting the unit element of Ki , for i {1, 2}, and where the last row of the constraint matrix equals minus the transpose of its last column.
Hence, since = 0, if the optimal solution z = (y ; x ; ; ) generated by the algorithm is such that = 0 it follows that x and y satisfy Ax K1 , AT y K2 , x K2 , y K1 , (10.66) (10.67) (10.68)
bT y cT x 0.
To deal with the status of the problems (CP ) and (CD) we need the following denition. Denition 10.20. We say that x is a decreasing (constant) ray of the primal problem (CP ) if x K2 , Ax K1 , and cT x < 0 (cT x = 0). Analogously, y is called an increasing (constant) ray of the dual problem (CD) if y K1 , AT y K2 , and bT y > 0 (bT y = 0). Using this denition and (10.66) (10.68) we can now prove the following lemma. 126
Lemma 10.21. If = 0 then one of the following three cases arises: (i) x is a constant ray for (CP ) and y is a constant ray for (CD); (ii) x is a decreasing ray for (CP ); (iii) y is an increasing ray for (CD). Proof. It is clear from (10.66) that if cT x < 0 then (ii) happens, and from (10.67) that if bT y > 0 then (iii) occurs. The only other possibility that remains is that cT x 0 and bT y 0. This implies bT y cT x 0. Combining this with (10.68) we see that bT y cT x = 0, whence we have 0 cT x = bT y 0. Hence bT y = cT x = 0, and we are in the situation described by (i). This completes the proof. 2
Lemma 10.22. If (CP ) has a decreasing ray then (CD) is infeasible, and if (CP ) is feasible then (CP ) is unbounded. Analogously, if CD) has an increasing ray then (CP ) is infeasible, and if (CD) is feasible then (CD) is unbounded. Proof. We only prove the rst statement in the lemma, because the proof of the second statement uses the same line of arguments. So, let us suppose that (P ) has a decreasing ray, and let it be given by x . So we have x K2 , Ax K1 , and cT x < 0. If (CD) is feasible, then there exists y K1 such that c AT y K2 . It follows that T y T Ax 0, c AT y x 0,
because y K1 and Ax K1 imply the rst inequality, and c AT y K2 and x K2 the second inequality. These inequalities imply cT x AT y
T
x = y T Ax 0,
but this contradicts the fact that cT x < 0. This contradiction shows that (CD) must be infeasible. Now let x be feasible for (CP ). Then Ax b K1 and x K2 . Since x K2 , we have for arbitrary 0 that x K2 , and, since Ax K1 also Ax K1 . Hence A(x + x ) b = (Ax b) + Ax K1 , x + x K2 . This shows that x + x is feasible for (CP ), for all 0. On the other hand, cT (x + x ) = cT x + cT x , 127
and since cT x < 0 this goes to minus innity if increases to innity. Therefore, (CP ) is unbounded. This completes the proof. 2
An immediate consequence of the above two lemmas is the following theorem, which requires no further proof. Theorem 10.23. If cT x < 0 then (CD) is infeasible and (CP ) is either infeasible or unbounded. Similarly, if bT y > 0 then (CP ) is infeasible and (CD) is either infeasible or unbounded. So, if cT x < 0 or bT y > 0 then both (CP ) and CD) are infeasible or unbounded. This implies that the optimal value of both problems is either or . In other words, (CP ) and CD) have no nite optimal values. It remains to deal with the case where cT x 0 and bT y 0. Theorem 10.24. Let cT x 0 and bT y 0. Then for both (CP ) and CD) one of the following three possibilities arise: (i) the problem has innitely many optimal solutions; (ii) the problem has a nite optimal value which is not attained; (iii) the problem is infeasible or unbounded. Proof. Let cT x 0 and bT y 0. Then, by Lemma 10.21, x is a constant ray for (CP ) and y is a constant ray for (CD). Now let (CP ) be feasible, and x a feasible solution. Then it is clear from the proof of Lemma 10.22 that x + x is primal feasible for each 0. Hence, the primal feasible region contains the half line {x + x : 0}
and, since cT x = 0, along this half line the objective function is constant. Therefore, if (CP ) has an optimal solution then this solution belongs to a half line in the optimal set. So, in that case (CP ) has innitely many optimal solutions, which means that (i) holds. If (CP ) has no optimal solution, then it can have a nite optimal value, but this value is not attained, i.e, (ii) holds. The remaining cases are covered by (iii). 2
According to Theorem 10.24, if cT x 0 and bT y 0 we have for both the primal and the dual problem three possibilities. Altogether this gives rise to nine possible situations. The question arises if each of these situations can occur. Figure 10.2 shows these possibilities. The three possibilities for (CP ) are indicated as P(i), P(ii), and P(iii), and similarly for (CD). 128
P(i) D(i) D(ii) D(iii) Example 10.25 Example 10.26 Example 10.27
P(ii) Example 10.26
P(iii) Example 10.27
Example 10.28
Figure 10.2. Nine possible situations when cT x 0 and bT y 0. Example 10.25 Let the dual problem be given 1 + y2 max y2 : c AT y 0 0 which means by 0 0 3 y1 y2 S+ , y2 0
So every feasible solution is optimal, and hence the problem is solvable with optimal value 0. The primal problem is given by x11 x12 x13 3 min x11 : x12 x22 x23 S+ , x22 = 0, x11 + 2x23 = 1 x13 x23 x33 which is equivalent to x11 0 x13 min x11 : 0 0 0 S3 , x11 = 1 + x13 0 x33
max {y2 : y1 0, y2 = 0} .
Again, every feasible solution is optimal. So the problem is solvable with optimal value 1. We conclude that both problems can be solvable and have dierent optimal values! N.B. Neither the primal nor the dual problem is strictly feasible! Example 10.26 Consider the problem 1 T 0 1 x1 min cT x : Ax b 0 0 x2 0 min x1 : x1 1 1 x2 129
Changing to the matrix notation of the matrices in S2 this can be written as + S2 + .
0 0 0 1
x1 x2
0 1 2 S+ . 1 0
Due to the denition of S2 this is equivalent to + min {x1 : x1 x2 1, x1 0, x2 0} . The optimal value of this problem is 0, but this value is not attained. Note that the
x2
x1
Figure 10.3. Feasible region of the problem in Example 10.26 problem is strictly feasible (take x1 = x2 = 2). So this example shows that a conic problem can be strictly feasible and bounded and at the same time unsolvable! The dual problem is 0 y11 1 y 21 max bT y 1 y12 0 y22 y11 y 21 : y12 y22 0 , 0
S2 , c AT y +
1 y11 y22
which is equivalent to
max y21 y12 :
y11 y12 y21 y22
S2 , y11 = 1, y22 = 0 . +
Since y22 = 0, it follows that y12 = y21 = 0. Hence the dual problem is solvable and the optimal value is 0, which is the same as for the primal problem. Note that this is in agreement with the Conic Duality Theorem. Example 10.27 Consider x1 3 min x2 : Ax b x2 L+ x1 130
This can be written as min x2 : which is equivalent to min {x2 : x2 = 0, x1 0} , The optimal set is the ray {(x1 ; 0) : x1 0}. The dual problem is max bT 0 : AT 1 + 3 2 =c 0 1 , L3 + . x2 + x2 x1 , 1 2
The feasibility conditions for this problem are equivalent to the system 2 + 2 3 , 1 2 1 + 3 = 0, 2 = 1.
One easily veries that this system is infeasible. We conclude that a conic problem can be solvable and possess an infeasible dual! Example 10.28 Let the dual problem be given by max y1 : c AT y y1 1 1 0 S2 + ,
Note that the 2 2 matrix is cannot be positive semidenite. So the problem is infeasible. The primal problem is given by T x11 x11 0 x 1 x 21 21 2 T min c x 1 = 0, x S+ . : Ax b 1 0 0 0 x12 1 x12 x22 x22 0 Changing to the matrix notation of the matrices in S2 this becomes + min x12 + x21 : 1 x12 x21 x22 S2 + .
For every R, x12 = x21 = and x22 = 2 yields a feasible solution with objective value 2. Hence the problem is unbounded.
10.5
Scaling
We start this section with some denitions. Let K be any convex pointed cone with int K = . The automorphism group of K is dened as the set G(K) of all invertible linear transformations G : span K span K such that G(K) = K. 131
Exercise 10.13. Find a cone K and an invertible linear transformation G of span K such that G(K) K, whereas G is not an automorphism of K.
Solution: Consider the case where K = L2 and + G= 0 0 1
1 2
Obvious G is nonsingular. Moreover, x = (x1 ; x2 ) K if and only if |x1 | x2 . Hence, if 1 x K then Gx = ( 2 x1 ; x2 ) K. So G(K) K. However, there is no x K such that Gx = (1; 1). Since (1; 1) K, this is an appropriate example. 2
Exercise 10.14. An invertible linear transformation G of span K is an automorphism of K if and only G(K) K and G1 (K) K. Prove this.
Solution: Let x K. Since G1 (K) K we have y := G1 x K. This implies x = Gy G(K), proving that G(K) K. Hence G(K) = K. 2
The cone K is called homogeneous if for all x, y int K there exists G G(K) such that Gx = y. Denition 10.29. The cone K is called symmetric if it is self-dual and homogeneous. We know already that each standard cone is self-dual. In this section we will prove that these cones are also homogeneous, and hence they are symmetric. We will also show that for any given pair x, y int K there exists an automorphism G of K such that G(x) = G1 (y). Now let s, z int K be a strictly feasible pair for (SP ) and > 0. Moreover, let G be an automorphism of K such that G(s) = G1 (z). We dene Gs G1 z v(s, z) := = . (10.69)
The vector v(s, z), which is shortly denoted as v, is called the scaling point of s and z with respect to . It follows that z= Gv, s= 132 G1 v. (10.70)
Therefore, we may write s = Mz + q Dening M = GM G, it thus follows that s = Mz + q Furthermore, qT z =

T
M Gv + q Gq v = GM Gv + ,
G1 v =
Gq q= , v = M v + q. Gv = q T v
G1 q
Since the optimal value of (SP ) equals 0, the following problem is equivalent to (SP ): (SSP ) min q T z : s = M z + q K, z K . Do we have a starting point for solving (SSP )? Yes: z = v and s = v. Let z denote the Newton step at v. Then the Newton iterate in the v-space is v + , z and the corresponding Newton iterate in the z-space is G (v + ) = Gv + G = z + G, z z z hence z = G. z Note that q int K, and M is skew-symmetric.
The just described scaling technique is called NT-scaling, after their inventors Nesterov and Todd [9]. At this stage we cannot make clear its usefulness, but in the next three sections, where we apply the NT-scaling to the three standard cones, it will become clear that from a computational point of view it has a big advantage. To nd the NT-scaling for a given cone we have to nd for given s, z int K an automorphism of K such that G(s) = G1 (z). It will turn out that we can always nd a G that is symmetric and positive denite. Having G, we can nd the scaling point v = v(s, z) of s and z from (10.69).
10.5.1
The linear case
In this case the computation of the scaling point of z, s int Rm is quite simple. + As we will see much simpler than in the other two cases! But in all cases we follow the same scheme: Given the cone K, we rst characterize the automorphisms of K, then nd an automorphism such that G(s) = G1 (z). Having done this we can construct the scaling point v of s and z. 133
Rm +
Let the matrix G represent an automorphism of Rm . This is the case if G Rm = + + and x Rm + Gx Rm , G1 x Rm . + +
Since each unit vector ei belong to Rm , it follows that Gei is nonnegative, and + hence the columns of G are nonnegative. So all entries in G are nonnegative. This is also true for the inverse G1 of G. Exercise 10.15. The inverse of a nonsingular nonnegative matrix is nonnegative if and only if every row and every column contains precisely one positive element. Prove this.
Solution: Let the given matrix of size m m and denoted as A, and let B its inverse. Then AB = I. Consider any row of A. Since A is invertible this row is nonzero, and hence it contains at least one positive element. Suppose it contains 2 positive elements. Without loss of generality we assume that the row is the rst row of A, and the rst two elements are positive. The columns of B dierent from its rst column are orthogonal the rst row of A. Since these columns are nonnegative, the rst two elements of these columns must be zero. But then these columns generate a space whose dimension is at most m 2. This contradicts the fact that B is invertible. This proves that every row of A contains precisely one positive element. It follows that A has precisely m positive entries. Hence, since A is invertible, each of its columns also contains precisely one positive element. 2
Since the matrix of any automorphism G is nonsingular, it follows from Exercise 10.15 that up to a permutation of its rows and columns G must be a diagonal matrix with positive diagonal. Since Gs = G1 z holds if and only of G2 s = z, it is clear that this holds if z G2 := diag . s We conclude that G = diag and Gs v= = zs . z s ,
Here the advantage of the NT-scaling becomes already apparent: if z is the center of (SP ) then v is the all-one vector, which is the unit element of Rm . This + makes that the Hessian matrix h(v) is the identity matrix, which is well-conditioned. During the course of the algorithm we know that z is always close to the current -center. As a consequence v will be close to the all-one vector, and the Hessian at v will be close to the identity matrix, which is from a computationally point of view very attractive. 134
10.5.2
The quadratic case
In this section we use the matrix Qm that was dened in (10.11), but for simplicity of notation we omit the subscript m and write Q instead of Qm . Then we have for any z Rm :
m1 2 zm 2 zi = z T Qz, i=1
whence Recall from Exercise 10.14 that a nonsingular matrix G is an automorphism of Lm + if and only if G(Lm ) Lm and G1 (Lm ) Lm . We start by considering the rst + + + + condition. Lemma 10.30. If G(Lm ) Lm then there exists some 0 such that + + GT QG Q Sm . + Proof. Due to (10.71) we have G(Lm ) Lm if and only if + + z T Qz 0, zm 0 This certainly holds if G satises z T Qz 0 and zm 0 (Gz)m 0. (10.74) At this stage we invoke Lemma A.4 in the appendix, which is known as the Slemma. Note that Q satises the hypothesis of Lemma A.4: taking for z the unit vector in the cone Lm (i.e. e = 0m1 ; 2 ), we have z T Qz = 2 > 0. So, the + S-lemma implies that (10.73) holds if and only if (10.72) holds for some 0. This proves the lemma. 2 z T GT QGz 0, (10.73) z T GT QGz 0, (Gz)m 0. (10.72) z Lm + z T Qz 0, zm 0. (10.71)
Lemma 10.31. If G is an automorphism of the cone Lm then there exists some + > 0 such that 1 (10.75) GT QG = Q, GT QG1 = Q. Proof. Since G(Lm ) Lm , Lemma 10.30 implies that there exists a nonnegative + + scalar 1 and a positive semidenite matrix S1 such that GT QG = 1 Q + S1 . 135
Since also G1 (Lm ) Lm , the same lemma implies the existence of a nonnegative + + scalar 2 and a positive semidenite matrix S2 such that GT QG1 = 2 Q + S2 . We multiply the rst equation from the left with GT and from the right with G1 . Then using also the second equation we obtain Q = 1 GT QG1 + GT S1 G1 = 1 (2 Q + S2 ) + GT S1 G1 , which gives (1 1 2 )Q = 1 S2 + GT S1 G1 . The matrix at the right-hand side is positive semidenite, whereas the left-hand side is indenite, unless 1 1 2 = 0. We conclude that both sides are zero, whence 1 2 = 1 and S1 = S2 = 0. Hence the lemma follows. 2
Lemma 10.32. Let u = (; um ) int Lm and A int Sm1 . Then u + + Gu = A u T u um
is an automorphism of the cone Lm if and only if + A= uT Qu Im1 + uuT um + uT Qu . (10.76)
Proof. If the given Gu is an automorphism, then Lemma 10.31 implies that, for some > 0, 1 GQG = Q, G1 QG1 = Q, where we used that Gu is symmetric.6 If we multiply the second equation from the 1 left with QG we obtain G1 = QGQ. This means that G QGQ = Im . In other words, A u A u = Im . uT um T um u This is equivalent to A2 = Im1 + uuT A = um u u u2 uT u = > 0. m
6 Note
(10.77) (10.78) (10.79)
that both equations are equivalent, because multiplication of the rst equation from the left and the right with G1 yields the second equation.
136
We rst show that the above system is equivalent with A2 = (uT Qu)Im1 + uuT . (10.80)
The equation (10.79) can be restated as = uT Qu, which is positive since u int Lm . As a consequence, (10.77) is just the same as (10.80). Thus it suces to + show that (10.77) and (10.79) imply (10.78). This can be seen as follows. From (10.77) we derive that u is an eigenvector of A2 with eigenvalue + uT u. Due to (10.79) this eigenvalue equals + uT u = u2 . Hence A has u as eigenvector with m eigenvalue um . This means that (10.78) holds. Thus we have shown that if Gu is an automorphism of the cone Lm then (10.80) holds. Since A2 is positive denite, + its square root is uniquely dened, and one may easily verify that it is given by (10.76). Hence the proof of the only if statement in the lemma is complete. It remains to show that the given matrix Gu is an automorphism of Lm . We rst + show that Gu is nonsingular. For any x = (; xm ) Rm we have x Gu x = Hence, if Gu x = 0 then A + xm u = 0, x uT x + xm um = 0. The rst equation, when using the denition of A, implies uT Qu x + (T x) u u um + uT Qu + xm u = 0. (10.81) (10.82) A u uT um x xm = A + xm u x uT x + xm um .
Putting = uT Qu, and using (10.82), we obtain (xm um ) u + xm u = 0. x um + This gives, after some straightforward reductions, x= Substitution into (10.82) yields xm uT u + xm um = 0. um + xm u. um +
Since T u = 2 u2 = um + u um this gives xm = 0. Since m > 0, we conclude that xm = 0. But then (10.81) implies x = 0, because A is positive denite. Hence x = 0, which proves that Gu is nonsingular. 137
We nally need to show that Gu (Lm ) Lm and G1 (Lm ) Lm . We do this for + + u + + the rst inclusion; the second inclusion is obtained in the same way. Let z Lm . + Then z T Qz 0 and zm 0. Since Gu QGu = Q it follows that (Gu z)T Q(Gu z) = z T Qz 0. So it remains to verify if (Gu z)m 0. But this is obvious, because since z Lm and u Lm we have + + (Gu z)m = uT z + um zm = uT z 0. Hence the proof is complete. 2
Observe that Gu em = u, G1 = u
1 GQu . uT Qu
Lemma 10.33. Let s = (; sm ) int Lm , z = (; zm ) int Lm and s z + + u= 2 z + Qs z T Qz sT Qs + z T s , = z T Qz . sT Qs
Then u int Lm and Gu s = G1 z. + u Proof. We have Gu s = G1 z if and only if G2 s = z. With A as given by (10.76), u u we have G2 = u A u T u um A u T u um = A2 + uuT A + um u u T T 2 T u A + um u um + u u .
Using (10.77) (10.78) we get G2 = u Im1 + 2uT u 2um u 2um uT 2u2 m = Q + 2uuT ,
where = uT Qu. So we need to nd u such that Q + 2uuT s = z. This gives 2(uT s)u = z + Qs, z + Qs . (10.83) 2uT s Now observe that what we have shown so far is that if G2 s = z then u satises u 1 (10.83). Recall that G1 = GQu . Hence, since G2 z = s we have G2 z = 2 s. u u Qu Thus we may conclude that Qu can be expressed as follows. u= Qu = 2 s + Qz . 2uT Qz 138 whence
Multiplying both sides with Q, while using that Q2 = Im , we obtain u= z + Qs . 2uT Qz (10.84)
Comparing this with the expression for u in (10.83) we conclude that uT s = uT Qz. Substituting the expression for u, as given by (10.83), into (10.85) we get which implies 2 sT Qs = z T Qz. Since > 0, we obtain = z T Qz . sT Qs (10.86) z T s + sT Qs z T Qz + z T s = , Ts 2u 2uT s (10.85)
Now that the value of has been xed, it remains to nd the vector u itself. This is achieved by using (10.83) and that = uT Qu. This yields (z + Qs) Q (z + Qs) 4 (uT s) which implies 2 sT Qs + z T Qz + 2z T s = 4 uT s Due to (10.86) we get z T Qz sT Qs + z T s = 2 uT s
2 2 2 T
= ,
Note that the expression at the left is positive, because s int Lm and z int Lm . + + So we can solve uT s from this equation. Because also u int Lm , we have uT s > 0. + Thus we obtain 1 uT s = z T Qz sT Qs + z T s . 2 Substitution into (10.83) yields the vector u: u= 2 This proves the lemma. z + Qs z T Qz sT Qs + zT s 2 .
139
We can now compute the scaling point v of s and z. With u as in Lemma 10.33 one has Gu s v= , where Gu = with A= Im1 + uuT , um + u= z + Qs 2 ( sT Qs + z T s) , = z T Qz . sT Qs A u T u um ,
Exercise 10.16. Prove that the matrix Gu is positive denite.
Solution: Due to the lemma on the Schur complement (cf. Lemma A.5 in the appendix), Gu is positive denite if and only if A is positive denite. We have A uu T u uT uu T = Im1 + , um um um + um um + u uT , = Im1 + um um +

uu T um
Im1
u uT um u m +
Since the eigenvalues of uuT are uT u and 0, the smallest eigenvalue of the last matrix equals um um + uT u uT u um + 1 = = . u m um + um um + um um + The last expression being positive, this proves the statement. 2
Exercise 10.17. If z is the -center of (SP ) then v = e. Prove this.
Solution: We know from (10.27) that z is the -center of (SP ) if z s = e e. Recall from (10.26) that sm z + zm s 0 zs= = zT s 2
140
So we have sm z = zm s, As a consequence,

z T s = 2.
z T Qz = sT Qs
2 s2 zm z m 2 2 s sm sm
2 2
2 2 1 s2 zm zm s m 2 2 s 2 sm sm
zm . sm
It follows that z = Qs, Hence z T s = sT Qs = This implies sT Qs = 2 , z T Qz = 2. z T Qz . s= Qz .
Substitution in the expression for u yields z + Qs u= 2 sT Qs z T Qz + z T s Hence we have s u= , 2 The matrix A is therefore given by 2 ssT 2 Now we can compute Gu s: A u T u um s sm A + sm u s u s + um sm
T sm 2
z + Qs 2Qs Qs = = = . 8 2 2 (2 + 2)
sm um = . 2
A=
Im1 +
Im1 +
2 ssT 2 sm + 2
Gu s =
The two components in the last vector are considered separately. Starting with uT s +um sm we write
T
u s + um sm
s2 sT s sT Qs s2 2 T s s m +m = = = = 2. = 2 2 2 2 2
141
The rst component in Gu s can be reduced as follows. A + sm u = s + s 2 ssT s s sm 2 2 sm + 2
1 = 2 1 = 2 1 = 2 1 = 2
1 = 2 1 = 2 2 s 2 = 0. Hence we obtain 1 Gu s v= = This completes the proof. 0 2 =
2 sT s sm s sm + 2 2 sT s s2 2 sm 2 m 2 + sm + 2 2 T s Qs sm 2 2 + s sm + 2 2 sm 2 2 + s sm + 2 2 + sm 2 2 s sm + 2 2 +
0 2
= e. 2
Recall from Exercise 10.17 that the Hessian at e is the identity matrix.
10.5.3
The semidenite case
Let U be any nonsingular mm matrix. We consider the mapping GU : Sm Sm dened by GU (S) = U T SU, S Sm . Obviously this mapping is linear and nonsingular. The inverse mapping is given by GU 1 (S) = U T SU 1 , S Sm . So we have G1 (S) = GU 1 (S). Moreover, if S Sm then GU (S) Sm and + + U GU 1 (S) Sm . Due to Exercise we may conclude that GU (Sm ) = Sm , showing + + + that GU is an automorphism of Sm . + Now let S, Z int Sm . Then we need to nd U such that GU (S) = G1 (Z). + U Hence U should be such that U T SU = U T ZU 1 . This is equivalent to U 2 SU 2 = Z. 142
T
This equation is satised if U 2 = S 2 S 2 ZS 2 because U 2 SU 2 = S 2 S 2 ZS 2 = S 2 S 2 ZS 2

1 1 1 1 1 1 1 1 1 2
S 2 ,
(10.87)
1 2
S 2 S S 2 S 2 ZS 2 S 2 ZS 2
1 1 1 1 2
1 2
S 2
1 2
S 2
= S 2 S 2 ZS 2 S 2 = Z. Since U 2 is symmetric, we conclude that U= S

1 2
1 2
S ZS
1 2
1 2
1 2
1 2
(10.88)
yields an automorphism GU such that GU (S) = G1 (Z). U Hence the scaling point of S and Z with respect to > 0 is given by V = GU (S) 1 = S 2 S 2 ZS 2
1 1 1 1 2 1 2 1 2
S 2
S S 2 S 2 ZS 2
1 2
S 2
It is of interst to consider the case where Z is the -center of (SP ). Then we have SZ = ZS = Im . Hence Z = S 1 . Therefore, S 2 ZS 2 = Im . This implies U= whence V = S
1 2 1 1
S ZS
1 2
1 2
1 2
1 2
1 2
= S 2 (Im ) 2 S 2
1 2
= 4 S 2 ,
1 1 1 1 GU (S) 1 = 4 S 2 S 4 S 2
= Im .
Recall from Exercise 10.10 that the Hessian at V is given by V 1 V 1 . Hence, if V = Im , then the Hessian is the identity matrix of size m2 m2 . Thus we may conclude that if Z is the -center of (SP ) then the Hessian at V in (SSP ) is equal to the identity matrix. Exercise 10.18. Let S, Z int Sm . Then we have + Z 2 Z 2 SZ 2 Prove this. 143
1 1 1
1 2
Z 2 = S 2 S 2 ZS 2
1 2
S 2 .
Solution: The above relation is equivalent to S 2 ZS 2

1 1 1 1 1 2
= S 2 Z 2 Z 2 SZ 2
1 2
Z2S2.
Putting T = S 2 Z 2 , the last equation can be stated as TTT

1 2
= T TTT
1 2
TT
Since T T T and T T T are symmetric, the matrices at both sides are symmetric, and they are also positive denite. Hence without losing something we may take squares. This gives the equivalent relation TTT = T TTT proving the statement.
1 2
TT T TTT
1 2
TT = TTT, 2
Exercise 10.19. Let S, Z int Sm and + D = Z 2 Z 2 SZ 2 Then D1 Z = SD. Prove this.

1 1 1
1 2
Z 2 = S 2 S 2 ZS 2
1 2
S 2 .
Solution: Note that the last equality is due to Exercise 10.18. Moreover, one has D = U 2 , T with U 2 as dened by (10.87). By the construction of U 2 we have U 2 SU 2 = Z. Since 2 1 U is symmetric, this implies DSD = Z, which gives SD = D Z. 2
Exercise 10.20. Let S, Z int Sm , D as in Exercise 10.19, and + W = D 2 ZD 2 = D 2 SD 2 . Then W 2 = D 2 ZSD 2 . Moreover, W 2 and ZS have the same eigenvalues. Prove this.
1 1 1 1 1 1 1 1
Solution: From Exercise 10.19 we know that D1 Z = SD. This implies that D 2 ZD 2 = 1 1 D 2 SD 2 , as given above. It follows that W 2 = D 2 ZD 2 D 2 SD 2 = D 2 ZSD 2 . Note that W is symmetric and positive denite, while ZS is not. But since Z and S are positive denite it can be shown that all eigenvalues of ZS are real and positive. We need
1 1 1 1 1 1
144
to prove the stronger statement that the eigenvalues of ZS are the same as those of W 2 . This goes as follows. As is well known, the eigenvalues of any m m matrix A are the roots of its characteristic equation: det (A Im ) = 0. Since det (A Im ) = det D 2 (A Im ) D 2
1 1 1 1
= det D 2 AD 2 Im ,
it follows that A and D 2 AD 2 have the same characteristic equation and hence the same eigenvalues. This proves the statement. 2
145
146
Appendix A
Some technical lemmas
We start with a slightly generalized version of the well-known Cauchy-Schwarz inequality. The classical Cauchy-Schwarz inequality follows by taking A = M = I in the next lemma (where I is the identity matrix). Lemma A.1 (Generalized Cauchy-Schwarz inequality). If A, M are symmetric matrices with xT M x xT Ax, x Rn , then aT M b
2
aT Aa
bT Ab , a, b Rn .
Proof. Note that xT Ax 0, x Rn , so A is positive semidenite. Without loss of generality assume that A is positive denite. Otherwise A + I is positive denite for all > 0, and we take the limit as 0, with a and b xed. The lemma is trivial if a = 0 or b = 0, so we assume that a and b are nonzero. Putting := then it follows from aT M b = that 1 2 (a + b)T M (a + b) (a b)T M (a b) 16 1 2 (a + b)T M (a + b) + (a b)T M (a b) . 16 Now using the hypothesis of the lemma we obtain 1 2 2 aT M b (a + b)T A(a + b) + (a b)T A(a b) 16 1 1 T 2 2 = 2aT Aa + 2bT Ab = a Aa + bT Ab . 16 4 aT M b
2
4
aT Aa , bT Ab
1 (a + b)T M (a + b) (a b)T M (a b) 4
147
When replacing a by a/ and b by b this implies a Mb

T 2
M (b)
1 4
1 T a Aa + 2 bT Ab 2
= aT Aa
bT Ab , 2
which was to be shown.
The following lemma gives an estimate for the spectral radius of a symmetric homogeneous trilinear form. The proof is due to Jarre [5]. Lemma A.2 (Spectral Radius for Symmetric Trilinear Forms). Let a symmetric homogeneous trilinear form M : Rn Rn Rn R be given by its coecient matrix M Rnnn . Let A : Rn Rn R be a symmetric bilinear form, with matrix A Rnn , and > 0 a scalar such that M [x, x, x]2 A[x, x]3 = x Then |M [x, y, z]| x
A 6 A
, x Rn .
, x, y, z Rn .
Proof. Without loss of generality we assume that = 1. Otherwise we replace A by 3 A. As in the proof of Lemma A.1 we assume that A is positive denite. Then, using the substitution M [x, y, z] := M [A 2 x, A 2 y, A 2 z] we can further assume that A = I is the identity matrix and we need to show that |M [x, y, z]| x under the hypothesis |M [x, x, x]| x
2
1 1 1
z
3 2
, x, y, z Rn .
For x Rn denote by Mx the (symmetric) matrix dened by y T Mx z := Mx [y, z] := M [x, y, z], y, z Rn . It is sucient to show that |M [x, y, y]| x
2 2 2
, x Rn .
, x, y Rn ,
because the remaining part follows by applying Lemma A.1, with M = Mx , for xed x. Dene := max {M [x, y, y] : 148 x
2
= y
= 1}
where and are the Lagrange multipliers. From this we deduce that = /2 and = , by multiplying from the left with xT , 0 and 0, y T , and thus we nd My y = x, 2My x = y,
and let x and y represent a solution of this maximization problem. The necessary optimality conditions for x and y imply that My y 0 2 x = + , 2My x 2y 0
2 which implies that My y = 2 y. Since My is symmetric, it follows that y is an eigenvector of My with the eigenvalue , which gives that
= y T My y = M [y, y, y]. This completes the proof. 2
We proceed with a short proof of the so-called S-lemma. The following lemma will be used in its proof. Lemma A.3. Let P and Q be symmetric matrices such that Tr (P ) 0 and Tr (Q) < 0. Then there exists a vector e such that eT P e 0 and eT Qe < 0. Proof. Since Q is symmetric, we have Q = U T U for some orthogonal matrix U and diagonal matrix . Note that Tr (Q) = Tr U T U = Tr U U T = Tr () . Now let be a random n-dimensional vector with independent entries taking values 1 with probabilities 1 . Then 2 UT UT
T T
Q UT = UT
U T U U T = T = Tr () = Tr (Q)
P U T = T U P U T . = Tr (P ) 0. Hence there 2
The expectation of the latter quantity is Tr U P U T exists at least one vector such that U T have eT P e 0 and eT Qe = Tr (Q) < 0. 149
T
P U T 0. Putting e = U T , we
Lemma A.4 (S-lemma). Let A and B be symmetric matrices and assume that y T Ay > 0 for some vector y.7 Then the implication z T Az 0 z T Bz 0 is valid if and only if B A for some 0. Proof. The if part is obvious: if B A for some 0 and z T Az 0, then T T z Bz z Az 0. Therefore, assuming that A has a positive eigenvalue and z T Az 0 z T Bz 0, it suces to show that 0, where (P ) := max { : B A
,
I, 0} .
We rst establish that is bounded above. This goes as follows. Since A has a positive eigenvalue, there exists > 0 and nonzero x such that Ax = x. Now we have 2 xT x xT (B A) x = xT Bx x xT Bx, which implies that cannot exceed the largest eigenvalue of B. Next we show that (P ) is strictly feasible. Taking = 1 and small enough (smaller than the smallest eigenvalue of B A) one has B A = B A I and > 0. Thus we have shown that (P ) is strictly feasible and above bounded. By the conic duality theorem this implies that the dual problem is solvable and has the same optimal value. So we have (D) = min{Tr (BX) : Tr (AX) 0, Tr (X) = 1, X 0}.
X
Let X be an optimal solution of (D). Since X that X = DDT . We thus may write
0, there exists a matrix D such
= Tr (BX ) = Tr BDDT = Tr DT BD 1 = Tr (X ) = Tr DDT = Tr DT D . Assume that < 0. Setting P = DT AD and Q = DT BD, we see that the matrices P and Q satisfy the premise of Lemma A.3. Hence there exists a vector e such that T T eT P e = (De) A (De) 0 and eT Qe = (De) B (De) < 0. But this contradicts the premise of the S-lemma (with z = De). Thus we have shown that 0. This proves the lemma. 2
0 Tr (AX ) = Tr ADDT = Tr DT AD
We nally we present the so-called lemma on the Schur complement and its proof. Lemma A.5 (Schur complement lemma). Let the symmetric block matrix A be given by P QT A= Q R
7 This
simply means that A has a positive eigenvalue.
150
and let R be positive denite. Then A is positive (semi)denite if and only if the matrix P QT R1 Q is positive (semi)denite. Proof. The matrix A is positive semidenite if and only if for all vectors u and v (of appropriate sizes) one has u v
T
P QT Q R
u v
= uT P u + 2uT Qv + v T Rv 0.
Fixing u, this holds if and only if for all v one has uT P u + 2uT Qv + v T Rv 0. Since R is positive denite, the left-hand side expression is strictly convex in v, and minimal if Rv + QT u = 0. So the minimal value is achieved at v = R1 QT u. Substituting this value of v we obtain that A is positive semidenite if and only if one has uT P u uT QR1 QT u 0 u This is equivalent to uT P QR1 QT u 0 u,
which proves that A is positive semidenite if and only if P QR1 QT is positive semidenite. The above reasoning can be easily adapted to justify that semidenite can be replaced by denite.8
8 An
alternative proof is given by the following relation: I QT R1 0 I P QT Q R I 0 R1 Q I = P QR1 QT 0 0 R .
151
152
Bibliography
[1] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004. [2] E. de Klerk. Interior Point Methods for Semidenite Programming. PhD thesis, TU Delft, The Netherlands, 1997. [3] F. Glineur. Topics in convex optimization: interior-point methods, conic duality and approximations. PhD thesis, Facult Polytechnique de Mons, 2001. e [4] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Programming, volume 277 of Mathematics and its Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994. [5] F. Jarre. Interior-Point Methods via Self-concordance or relative Lipschitz condition. Bayerischen Julius-Maximilians-Universitt, Wrzburg, Germany, a u March 1994. Habilitationsschrift. [6] H. Ltkepohl. Handbook of matrices. John Wiley & Sons, Chichester, 1996. u [7] Y. Nesterov. Introductory Lectures on Convex Optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2004. [8] Y.E. Nesterov and A.S. Nemirovskii. Interior Point Polynomial Methods in Convex Programming : Theory and Algorithms. Number 13 in Studies in Applied Mathematics. SIAM, Philadelphia, USA, 1994. [9] Y.E. Nesterov and M.J. Todd. Primal-dual interior-point methods for selfscaled cones. SIAM Journal on Optimization, 8(2):324 364, 1998. [10] James Renegar. A mathematical view of interior-point methods in convex optimization. MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. [11] J.F. Sturm. Primal-dual Interior-point Approach to Semi-denite Programming. PhD thesis, Erasmus University, Rotterdam, NL, 1997.
153

【书】nonlinear optimization （SC function）

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

【书】nonlinear optimization （SC function）

Diunggah oleh

Hak Cipta:

Format Tersedia

UNIVERSITY OF WATERLOO

Waterloo, February 22, 2006

Epigraphs and closed convex functions

Denition of the self-concordance property

where x = ay + b. Hence it follows, due to the exponent is -self-concordant as well.

in the denition, that

This proves that (x) is a 1-self-concordant function.

(x) = k(k 1)xk2 +

(x) = k(k 1)(k 2)xk3 Hence

k(k 1)(k 2)xk 2 (k(k 1)xk + 1)3

k(k 1)(k 2)xk 2 (k(k 1)xk + 1)

(k(k 1)(k 2)y 2)2 (k(k 1)y + 1)3

Now g (z) = and g (z) = 0 if z g 2(p + 3) p

12 p2 (z 2)z + p(8z 4) (1 + z)4

Thus we obtain that if k > 2 then

= this value occurs for y =

which means that

2(k + 1) . k(k 1)(k 2)

(x) = log x, 1 , x (x) = 1 , x2

and it easily follows that is 1-self-concordant.

Hence, using also x > 0 we may write,

showing that is 1-self-concordant.

for each x D, proving that is

Equivalent formulations of the self-concordance property

(6.3) (6.4) (6.5)

() = hT 2 (x + h)h = x,h Since d 2 (x + h) = d xi xj we obtain

which is in agreement with (6.5).

(6.7) (6.8) (6.9)

with 0 < x Rn . Then, with e denoting the all-one vector, 1 2 (x) = 2, x2 xi i

2 (x)[h, h] = Hence, putting i :=

(x)[h, h] = Applying the inequality

thus proving that is 1-self-concordant.

Example 6.13 With as dened in Example 6.9 we now consider

So we have, also using (6.2),

Putting i := hi /(1 + xi ) we thus have (0) =

It remains to show that | (0)| complete.

. But this follows from (6.12). Hence the proof is

Solution: One has

Hence, (6.15) is equivalent to (6.8), which completes the proof.

Positive deniteness of the Hessian matrix

= q() 2 , and hence we then have

|q ()| 2 q() 2 . If q() > 0 this implies d d 1 q() = q () 2q() 2

Assume q(1 ) > 0 for some 1 I. Let 0 := min { : q() > 0, (, 1 ]} .

proof is due to Ir. P. Sonneveld and Dr. A. Almendral.

Some basic inequalities

g(x)T H(x)1 g(x).

Lemma 6.17. Let x D and R+ and d Rn such that x + d D. Then d x 1 + d

> 0 and the right for all

just as in (6.16) with v = d. Then, from (6.19),

q () dq() 2 . = 3 d 2q() 2 Consequently, if x + d D then q(0) 2 q() 2 q(0) 2 + . 14

1 1 1 + , d x d x+d d x or, equivalently, 1 d d x Hence, if 1 + d

proving the lemma.

Exercise 6.10. With h Rn xed, dene () := Then () := h

and hence | ()| . Derive Lemma 6.17 from this.

1 q(1) 1 1 log = (log q(1) log q(0)) = 2 q(0) 2 2 15

By (6.17) we have log and log v v

. Also using Lemma 6.17 this implies

Exercise 6.11. If x and d are such that x D, x + d D and d (1 d

Derive this from Lemma 6.18.

Lemma 6.19. Let x D and d Rm . If d

Quadratic convergence of Newtons method

g(x)T H(x)1 g(x).

g(x+ )T H(x+ )1 g(x+ ).

then x+ is feasible. Moreover, if (x) < (x) 1 (x)