9 tayangan

Diunggah oleh saloni3122

- Linear Programming I.pdf
- PSM010_2
- ADS - A Fortran Program for Automated Design Synthesis
- Convex Optimization and Applications
- 59.317549
- tao.pdf
- A novel method for solving the fully neutrosophic linear programming problems
- Optimization-in-ChemicalEngineering_TOC.pdf
- NLSC Lecture Notes 2016 Sufficient Conditions Chapter
- 01707402.pdf
- Chapter 2
- Mathematical Mode 00 Mcco
- A New method for Solving Fully Fuzzy Linear Programming Problem
- Multifidelity-constrained-optimization-March-Willcox.pdf
- Support Vector Machines
- reservoir
- Opf With Enhanced GA
- CPLEX11_2manuals
- Maze Bridges2013 119
- Ch13 - NLP,DP,GP2005

Anda di halaman 1dari 96

Introduction

http://gol.dsi.unifi.it/users/schoen

NLP problems

min f (x) x S Rn

A global minimum or global optimum is any x S such that

x S f (x) f (x )

Standard form:

min f (x) hi (x) = 0 gj (x) 0 i = 1, m j = 1, k

x S B ( x, )f (x) f ( x)

where B ( x, ) = {x Rn : x x } is a ball in Rn . Any global optimum is also a local optimum, but the opposite is generally false.

Convex Functions

A set S Rn is convex if

x, y S x + (1 )y S

Convex Functions

for all choices of [0, 1]. Let Rn : non empty convex set. A function f : R is convex iff

f (x + (1 )y ) f (x) + (1 )f (y )

Every convex function is continuous in the interior of . It might be discontinuous, but only on the frontier. If f is continuously differentiable then it is convex iff

f (y ) f (x) + (y x)T f (x)

Convex functions

for all y

x y

If f is twice continuously differentiable f it is convex iff its Hessian matrix is positive semi-denite:

2 f (x) := 2f xi xj

Example: an afne function is convex (and concave) For a quadratic function (Q: symmetric matrix):

1 f (x) = xT Qx + bT x + c 2

then 2 f (x)

0 iff v T 2 f (x)v 0 v Rn

we have

f (x) = Qx + b f is convex iff Q 0 2 f (x) = Q

min f (x) xS

Maximization

Slight abuse in notation: a problem

max f (x) xS

is a convex optimization problem iff S is a convex set and f is convex on S . For a problem in standard form

min f (x) hi (x) = 0 gj (x) 0 i = 1, m j = 1, k

is called convex iff S is a convex set and f is a concave function (not to be confused with minimization of a concave function, (or maximization of a convex function) which are NOT a convex optimization problem)

if f is convex, hi (x) are afne functions, gj (x) are convex functions, then the problem is convex.

Convex optimization is easy, non convex optimization is usually very hard. Fundamental property of convex optimization problems: every local optimum is also a global optimum (will give a proof later) Minimizing a positive semidenite quadratic function on a polyhedron is easy (polynomially solvable); if even a single eigenvalue of the hessian is negative the problem becomes N P hard

Many (of course not all . . . ) functions are convex! afne functions aT x + b quadratic functions 1 xT Qx + bT x + c with Q = QT , Q 2 any norm is a convex function

x log x (however log x is concave) f is convex if and only if x0 , d Rn , its restriction to any line: () = f (x0 + d), is a convex function 0

g (x, y ) convex in x for all y g (x, y ) dy convex

more examples . . .

maxi {aT i x + b} is convex f, g : convex max{f (x), g (x)} is convex

fa convex functions for any a A (a possibly uncountable set) supaA fa (x) is convex f convex f (Ax + b) T race(AT X ) =

i,j

Data Approximation

Aij Xij is convex (it is linear!) log det X 1 is convex over the set of matrices X Rnn : X 0

Table of contents

norm approximation maximum likelihood robust estimation

Norm approximation

Problem:

min Ax b

x

where A, b: parameters. Usually the system is over-determined, i.e. b Range(A). For example, this happens when A Rmn with m > n and A has full rank. r := Ax b: residual.

Examples

r = r = rT r: least squares (or regression) rT P r with P 0: weighted least squares

Example: 1 norm

Matrix A R10030 80 70 60 50 40 30 20 10 0

Nonlinear Programming Models p. 19

norm 1 residuals

1 i |ri |: absolute or approximation

Possible (convex) additional constraints: maximum deviation from an initial estimate: x xest simple bounds i xi ui ordering: x1 x2 xn

-5

-4

-3

-2

-1

norm

20 18 16 14 12 10 8 6 4 2 0 -5 -4 -3 -2 -1 0 1 2 3 4 5

norm residuals

2 norm

18 16 14 12 10 8 6 4 2 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 norm 2 residuals

Variants

min

i

comparison

4 3.5 3 2.5 2 1.5 1 0.5 0 -0.5

Nonlinear Programming Models p. 23

2

h linearquadratic h(z ) =

-2

-1.5

-1

-0.5

0.5

1.5

Maximum likelihood

Given a sample X1 , X2 , . . . , Xk and a parametric family of probability density functions L(; ), the maximum likelihood estimate of given the sample is

= arg max L(X1 , . . . , Xk ; )

(taking the logarithm, which does not change optimum points):

= arg max

i

log(p(Xi aT i ))

Example: linear measures with and additive i.i.d. (independent identically dsitributed) noise:

Xi = aT i + i

(1)

N (0, ), i.e. p(z ) = (2 )1/2 exp(z 2 /2 2 ) MLE is the 2 estimate: = arg min A X 2 ; p(z ) = (1/(2a)) exp(|z |/a) 1 estimate: = arg min A X 1

k

L(X1 . . . , Xk ; ) =

i=1

p(Xi aT i )

Nonlinear Programming Models p. 25 Nonlinear Programming Models p. 26

Ellipsoids

p(z ) = (1/a) exp(z/a)1{z0} (negative exponential)the estimate can be found solving the LP problem: min 1T (X A) A X p uniform on [a, a] the MLE is any such that A X a

E = {x Rn : (x x0 )T P 1 (x x0 ) 1}

where x0 Rn is the center of the ellipsoid and P is a symmetric positive-denite matrix. Alternative representations:

E = {x Rn : Ax b

2

1}

where A 0, or

E = {x Rn : x = x0 + Au | u

2

1}

where A is square and non singular (afne transformation of the unit ball)

Nonlinear Programming Models p. 27 Nonlinear Programming Models p. 28

Least Squares: x = arg min but it is known that

T i (ai x

RLS

It holds: then, choosing y = / if 0 and y = / , otherwise if < 0, then y = 1 and

| + T y | = | + T / sign()| = || + | + T y | || + y

ai Ei = {a i + Pi u : u 1}

where Pi = PiT

ai Ei i 2 (aT i x bi )

then:

T T max |(aT i x b i + u Pi x | i x bi )| = max |a ai Ei u 1

x r = arg min max

x ai Ei i

(aT i x

bi

)2

= |a T i x bi | + Pi x

...

Thus the Robust Least Squares problem reduces to

1/2

...

min t

x,t

min

i

2 (|a T i x bi | + Pi x )

min t

x,t 2

a T i x + bi + Pi x ti

a T i x bi + Pi x ti

C = {(x, t) Rn+1 : x t} ti i

i.e.

|a T i x bi | + Pi x

Geometrical Problems

projections and distances polyhedral intersection extremal volume ellipsoids classication problems

Geometrical Problems

Projection on a set

Given a set C the projection of x on C is dened as:

PC (x) = arg min z x

z C

If where fi : convex C is a convex set and the problem

PC (x) = arg min x z Az = b fi (z ) 0 i = 1, m C = {x : Ax = b, fi (x) 0, i = 1, m}

is convex.

dist(C (1) , C (2) ) =

xC (1) ,y C (2)

If C (j ) = {x : A(j ) x = b(j ) , fi 0} then the minimum distance can be found through a convex model:

min x(1) x(2)

(j )

min

xy

(2)

Polyhedral intersection

1: polyhedra described by means of linear inequalities:

P1 = {x : Ax b}, P2 = {x : Cx d}

Polyhedral intersection

P1 P1 P2 ? Just check P2 = ? It is a linear feasibility problem: Ax b, Cx d sup{cT k x : Ax b} dk k

2: polyhedra (polytopes) described through vertices:

P1 = conv{v1 , . . . , vk }, P2 = conv{w1 , . . . , wh } P1 P2 = ? Need to nd 1 , k , 1 , h 0: i = 1

i j

Given v1 , . . . , vk Rn nd an ellipsoid

E = {x : Ax b 1}

j = 1

* * * * * * * * * * * * * *

Nonlinear Programming Models p. 41

* * * *

i vi =

i j

j wj

P1 P2 ? i = 1, . . . , k check whether j 0: j = 1

j

j wj = vi

j

A = AT 0. Volume of E is proportional to det A1 convex optimization problem (in the unknowns: A, b): min log det A

1

Given P = {x : Ax b} nd an ellipsoid:

E = {By + d : y 1}

A = AT A 0 Avi b 1 i = 1, k

E P

B,d

Difcult variants

These problems are hard: nd a maximal volume ellipsoid contained in a polyhedron given by its vertices

* * * * * * * * * * * * *

*

aT i (By

sup {aT i By y 1

+ d) bi +

aT i d}

y : y 1 bi i

Bai + aT i d bi

* * *

Bai

It is already a difcult problem to show whether a given ellipsoid E contains a polyhedron P = {Ax b}. This problem is still difcult even when the ellipsoid is a sphere: this problem is equivalent to norm maximization in a polyhedron it is an NPhard concave optimization problem.

Given two point sets X1 , . . . , Xk , Y1 , . . . , Yh nd an hyperplane aT x = t such that:

aT Xi 1

T

i = 1, k j = 1, h

a Yj 1

Robust separation

Robust separation

Find a maximal separation:

max min aT Xi max aT Yj

i j

a: a 1

max t1 t2 aT Xi t1 aT Yj t2 a 1

i j

Fabio Schoen 2008

min f (x)

xS

http://gol.dsi.unifi.it/users/schoen

where f : S R. Let x1 , x2 S and d = x2 x1 . d is a feasible direction. If there exists > 0 such that f (x1 + d) < f (x1 ) (0, ), d is called a descent direction at x1 . Elementary necessary optimality condition: if x is a local optimum, no descent direction may exist at x

Optimality Conditions p. 1

Optimality Conditions p. 2

If x S is a local optimum for f () and there exists a neighborhood U (x ) such that f C 1 (U (x )), then

dT f (x ) 0 d : feasible direction

Optimality Conditions p. 3

Optimality Conditions p. 4

proof

Taylor expansion:

f (x + d) = f (x ) + dT f (x ) + o() d cannot be a descent direction, so, if is sufciently small, then f (x + d) f (x ). Thus dT f (x ) + o() 0

General case:

min f (x) gi (x) 0 i = 1, . . . , m (X : open set) xX

and dividing by ,

o() 0 dT f (x ) +

d xk x = lim xk x xk x d

Optimality Conditions p. 5

where xk S .

Optimality Conditions p. 6

Some examples

S = Rn T (x) = Rn S = {Ax = b} x T (x) = {d : Ad = 0} S = {Ax b}; let I be the set of active constraints in x : aT = bi i x < bi aT i x iI i I.

Optimality Conditions p. 7

Optimality Conditions p. 8

T aT )/ (xk x ) i d = ai lim(xk x k

iI

k

k

Thus if d T ( x) aT i d 0 for i I .

Optimality Conditions p. 9

Optimality Conditions p. 10

Example

Viceversa, let xk = x + k d. If aT i d 0 for i I

T aT x + k d) i xk = ai (

= bi + bi

k aT i d

iI

Let S = {(x, y ) R2 : x2 y = 0} (parabola). Tangent cone at (0, 0)? Let {(xk , yk ) (0, 0)}, i.e. xk 0, yk = x2 k:

(xk , yk ) (0, 0) =

4 x2 k + (xk )

T x + k d) aT i xk = ai (

< bi + k aT i d bi

iI

= |xk | 1 + x2 k

xk =1 xk 0 |xk | 1 + x2 k xk = 1 lim xk 0 |xk | 1 + x2 k lim+

Optimality Conditions p. 11

Thus

T (x) = {d : aT i d 0 i I}

Optimality Conditions p. 12

Descent direction

d Rn is a feasible direction in x S if >0: x + d S [0, ).

Let x S Rn be a local optimum for minxS f (x); let f C 1 (U ( x)). Then

dT f ( x) 0 d T ( x)

d feasible d T ( x), but in general the converse is false. If f ( x + d) f ( x) d is a descent direction (0, )

f (xk ) = f ( x) + T f ( x)(xk x ) + o( xk x ) x local optimum U ( x) : f (x) f ( x) x U S. = f ( x) + T f ( x)(xk x ) + xk x o(1).

Optimality Conditions p. 13

Optimality Conditions p. 14

...

If k is large enough, xk U ( x):

f (xk ) f ( x) 0

Examples

Unconstrained problems Every d Rn belongs to the tangent cone at a local optimum

T f ( x)d 0 d Rn

T f ( x)(xk x ) + xk x o(1) 0

Choosing d = ei e d = ei we get

f ( x) = 0

T f ( x)d 0.

NB: the same is true if x is a local minimum in the relative interior of the feasible region.

Optimality Conditions p. 15

Optimality Conditions p. 16

min f (x) Ax = b

From LP duality

max 0T = 0 AT = f ( x)

f ( x)d 0

T

: AT = f ( x)

d : Ad = 0

equivalent statement:

min T f ( x)d = 0

d

Ad = 0

(a linear program).

Optimality Conditions p. 17 Optimality Conditions p. 18

Linear inequalities

min f (x) Ax b

Linear inequalities

From LP duality:

max 0T = 0 x) AT I = f ( 0

Tangent cone at a local minimum x : {d Rn : aT d 0 i I ( x ) } . Let AI be the rows of A i associated to active constraints at x . Then

min T f ( x)d = 0

d

Thus, at a local optimum, the gradient is a non positive linear combination of the coefcients of active constraints.

AI d 0 0

Optimality Conditions p. 19

Optimality Conditions p. 20

Farkas Lemma

Let A: matrix in Rmn and b Rn . One and only one of the following sets:

AT y 0 bT y > 0

Geometrical interpretation

AT y 0 b y>0

T

Ax = b x0

and

Ax = b x0

a1

{z : x : z = Ax, x 0} b a2

is non empty

{y : AT y 0}

Optimality Conditions p. 21

Optimality Conditions p. 22

Proof

1) if x 0 : Ax = b bT y = xT AT y . Thus if AT y 0 bT y 0. 2) Premise: Separating hyperplane theorem: let C and D be two convex nonempty sets: C D = . Then there exists a = 0 and b:

aT x b xC xD

2) let {x : Ax = b, x 0} = . Let

S = {y Rm : x 0, Ax = y } S is closed, convex and b S . From the separating hyperplane theorem: Rm = 0, R: T y T b > x S

aT x b

a C<b aT x > b xD

Optimality Conditions p. 23

0 S 0 T b > 0; T Ax for all x 0. This is possible iff T A 0. Letting y = we obtain a solution of AY y 0 bT y > 0

Optimality Conditions p. 24

G( x) = {d R : gi ( x)d 0}

n T

G( x) T ( x). In fact if {xk } is feasible and d = lim

k

iI

xk x xk x

then gi ( x) 0 and

g ( x + lim(xk x )) 0

k

Optimality Conditions p. 25

Optimality Conditions p. 26

...

xk x )0 k xk x xk x g ( x + lim xk x lim )0 k xk x g ( x + lim xk x d) 0 g ( x + lim xk x

k

gi ( x + k d) = gi ( x) + k T gi ( x)d + o(k )

where k > 0 and d belong to the tangent cone T ( x). If the ith constraint is active, then

gi ( x + k d) = k T gi ( x)d + o(k ) 0

Let k = xk x , if k 0:

g ( x + k d) 0

Optimality Conditions p. 27

Optimality Conditions p. 28

example

G( x) = T ( x); x3 + y 0

(KarushKuhnTucker) Let x X Rn , X = be a local optimum for

min f (x) gi (x) 0 i = 1, . . . , m xX

y 0

f ( x) +

Optimality Conditions p. 29

iI

i gi ( x) = 0.

Optimality Conditions p. 30

Proof

x local optimum if d T ( x) dT f ( x) 0. But d T ( x) dT gi ( x) 0 i I.

polyhedra: linear independence: Slater condition:

T f ( x)d > 0

X open set, gi (x), i I convex differentiable functions in x , gi (x), i I continuous in x , and x X strictly feasible: gi ( x) < 0 i I.

X open set, gi (x), i I continuous in x and {gi ( x)}, i I are linearly independent.

T gi ( x)d 0

iI

iI

i T gi ( x) = T f ( x) i 0

iI iI

Optimality Conditions p. 31 Optimality Conditions p. 32

Convex problems

An optimization problem

min f (x)

xS

min f (x) hj (x) = 0 gi (x) 0 i = 1, m j = 1, k

is a convex problem if

S is a convex set, i.e. x, y S x + (1 )y S f is a convex function on S , i.e. f (x + (1 )y ) f (x) + (1 )f (y ) [0, 1] and x, y S

Optimality Conditions p. 33

if

f is convex gi are convex hj are afne (i.e. of the form T x + )

[0, 1]

Optimality Conditions p. 34

Convex problems

Every local optimum is a global one. Proof: x : local optimum for minS f (x) x : global optimum. S convex x + (1 ) x S . Thus if 0

f ( x) f (x + (1 ) x

(for a convex differentiable problem: if dT f ( x) d T ( x), then x is a (global) optimum Proof:

f (y ) f ( x) + (y x )T f (x) y S

f (x ) + (1 )f ( x)

But y x T ( x)

f (y ) f ( x) + dT f (x) f ( x) y S

f ( x) f (x )

Optimality Conditions p. 35

Optimality Conditions p. 36

(for convex problems) The set of global minima of a convex problem is a convex set. In fact, let x and y be global minima for the convex problem

min f (x)

xS

x : local optimum for min f (x) hj (x) = 0 gi (x) 0 xXR

n

i = 1, . . . , m j = 1, . . . , k

f (x + (1 ) y ) f ( x) + (1 )f ( y) f + (1 )f = f

Let I : set of active inequalities in x . If f (x), gi (x), i I , hj (x) C 1 and constraint qualications hold in x , i 0 i I e j R, j = 1, . . . , h:

h

where f is the global minimum value. Thus the equality holds and the proof is complete.

Optimality Conditions p. 37

f ( x) +

iI

i gi ( x) +

j =1

j hj ( x) = 0

Optimality Conditions p. 38

Complementarity

KKT equivalent formulation:

m h

If f, g1 , hj C 2 in x and the gradients of active constraints in x are linearly independent, then there exist mutlipliers i 0, i I and j , j = 1, . . . , k such that

k

f ( x) +

i=1

i gi ( x) +

j =1

j hj ( x) = 0 i gi ( x) = 0 i = 1, . . . , m

f ( x) +

iI

i gi ( x) +

j =1

j hj ( x) = 0

and

dT 2 L( x)d 0

k

2 L(x) := 2 f (x) +

Optimality Conditions p. 39

iI

i 2 gi (x) +

j =1

j 2 hj (x)

Optimality Conditions p. 40

Sufcient conditions

Let f, gi , hj twice continuously differentiable. Let x , , :

k

Lagrange Duality

Problem:

f = min f (x) gi (x) 0 xX

f (x ) +

iI

i gi (x ) +

j =1

j hj (x ) = 0 i gi (x ) = 0

d L(x )d > 0 d :d hj (x ) = 0

i 0

T

dT gi (x ) = 0, i I

L(x; ) = f (x) +

i

i gi (x)

0, x X

Optimality Conditions p. 41

Optimality Conditions p. 42

Relaxation

Given an optimization problem

min f (x)

xS

Proof: Feasible set of the Lagrange problem: X (contains the original one) If g (x) 0 and 0

min g (x)

xQ

a relaxation is a problem

where

SQ

g (x) f (x)

x S.

Weak Duality : The optimal value of a relaxation is a lower bound on the optimum value of the problem.

Optimality Conditions p. 43 Optimality Conditions p. 44

with respect to constraints g (x) 0:

() = inf L(x, )

xX

min r

xX

4r (xi xj ) (yi yj )2 0 xi , yi 0

xi , yi 1

1i<jN i = 1, . . . , N i = 1, . . . , N

For every choice of 0, () is a lower bound for every feasible solution and in particular, is a lower bound for the global minimum value of the problem.

Optimality Conditions p. 45

Optimality Conditions p. 46

solution

When N = 2, relaxing the rst constraint:

() = min r + (4r2 (x1 x2 )2 (y1 y2 )2 )

x,y,r

() = min r + 4r2 2

r

x1 , x 2 , y1 , y2 0

x1 , x 2 , y1 , y2 1

1 r= 8 () = 2 1 16

This is a lower bound on the optimum value. Best possible lower bound:

= max ()

1 = 4 2

Optimality Conditions p. 47

2 2

Optimality Conditions p. 48

Lagrange Dual

Choosing (x1 , y1 ) = (0, 0) and (x2 , y2 ) = (1, 1) a feasible solution with r = 2/2 is obtained. The Lagrange dual gives a lower bound equal to 2/2: same as the objective function at a feasible solution optimal solution! (an exception, not the rule!)

= max () 0

This problem might: 1. be unbounded 2. have a nite sup but non max 3. have a unique maximum attained in correspondence with a single solution x 4. have many different maxima, each connected with a different solution x

Optimality Conditions p. 49

Optimality Conditions p. 50

Equality constraints

f = min f (x) hj (x) = 0 gi (x) 0 xX i = 1, . . . , m j = 1, . . . , k

Linear Programming

min cT x Ax b

() = min cT x + T (Ax b)

x

Lagrange function:

L(x; , ) = f (x) + g (x) + h(x)

T T

= T b + min(cT + T A)x.

x

min(cT + T A)x =

x

0 if cT + T A = 0 otherwise.

Optimality Conditions p. 52

Optimality Conditions p. 51

...

Lagrange dual function:

() =

Lagrange dual: max T b T A + cT = 0 0 which is equivalent to: max T b T A = cT 0

Optimality Conditions p. 53

1 min xT Qx + cT x 2 Ax = b

b if c + A = 0 otherwise.

1 () = min xT Qx + cT x + T (Ax b) x 2 1 = T b + min xT Qx + (cT + T A)x x 2

Optimality Conditions p. 54

QP Case 1

Q has at least one negative eigenvalue 1 min xT Qx + (cT + T A)x = x 2

QP Case 2

Q positive denite minimum point of the dual Lagrange function: Qx + (c + AT ) = 0

1 T x Qx + (cT + T A)x = 2 1 2 T d Qd + (cT + T A)d 2

i.e.

x = Q1 (c + AT )

Optimality Conditions p. 55

Optimality Conditions p. 56

...

Lagrange function value:

1 T () = T b + x Qx + (cT + T A) x 2 1 = T b + (c + AT )T Q1 QQ1 (c + AT ) 2 T T (c + A)Q1 (c + AT ) 1 = T b + (c + AT )T Q1 (c + AT ) 2 T T (c + A)Q1 (c + AT )

T

...

Lagrange dual (seen as a min problem): 1 min T b + (c + AT )T Q1 (c + AT ) 2 Optimality conditions: b + AQ1 (c + AT ) = 0

feasibility of x

1 = b (c + AT )T Q1 (c + AT ) 2

if we nd optimal multipliers (a linear system) we get the optimal solution x (thanks to feasibility and weak duality)!

Optimality Conditions p. 57

Optimality Conditions p. 58

For any problem

f = min f (x) gi (x) 0 i = 1, . . . , m xX

Dim.

From Weierstrass theorem

() = min f (x) + T g (x)

xX

xX

where X is non empty and compact, if f and gi are continuous then the Lagrange dual function is concave

xX

xX xX

= (a) + (1 )(b).

Optimality Conditions p. 59

Optimality Conditions p. 60

max () = max min(f (x) + g (x))

xX T

...

be the optimal solution of the restricted dual. Is it an Let T g (x)? Check: optimal dual solution? Is it true that z f (x) + we look for x , optimal solution of T g (x) min f (x) +

xX

is equivalent to

max z 0 z f (x) + T g (x) x X

max z 0 z f (xj ) + T g (xj ) j = 1, . . . , k

Optimality Conditions p. 61

otherwise the pair x , f ( x) is added to the restricted dual and a new solution is computed.

Optimality Conditions p. 62

Geometric programming

Unconstrained Geometric program:

m n

Transformed problem:

m n

min

x>0 k=1

ck

j =1

xj kj

kj R, ck > 0

min

y k=1

ck

j =1 m

ekj yj ek y+k

k=1

T

= k = log ck

xj = exp(yj ) yj R

min

y

Optimality Conditions p. 63

Optimality Conditions p. 64

Duality example

Dual of

m

Dual function

m T x exp(k k=1

+ k )

x,y k=1

exp yk + T (Ax + y )

No constraints dual lagrange function is identical to f (x)! Strong duality holds, but is useless. Simple transformation:

m

m

min log

k=1

exp yk

T x + k yk = k

y k=1

exp yk + T ( y )

Optimality Conditions p. 65

Optimality Conditions p. 66

exp yi i = 0 k exp yk Lagrange multipliers exist provided that i = 1

i

Substituting j = exp yj /

L() = log

j

exp yk , j yj

j

exp yj exp yj

= log

j

yj exp yj /

j k

exp yk exp yj yk ))

i > 0 i

1 ( k exp yk

exp yk (log

k j

=

k

k

exp yj yk )

=

Optimality Conditions p. 67

Optimality Conditions p. 68

Lagrange Dual

The Lagrange Dual becomes:

max T

min f (x) k log k

k

Ax b

k = 1

k

Lagrange function:

L(x, ) = f (x) + T (b Ax)

AT = 0 0

f (x ) = AT

Optimality Conditions p. 70

Ax b

T (b Ax ) = 0

Optimality Conditions p. 69

min f (x) x0 j = f (x ) xj j = 1, n

f (x ) = 0 x 0

from which

f (x ) =0 xj f (x ) 0 xj j : x j > 0

otherwise

( )T x = 0

Optimality Conditions p. 71

Optimality Conditions p. 72

Box constraints

min f (x) xu i < ui i

Then, from complementarity,

f (x ) = j xj f (x ) = j xj f (x ) =0 xj j J j Ju j J0

f (x ) =

( x )T = 0 (x u)T = 0

( , ) 0

Optimality Conditions p. 73

Optimality Conditions p. 74

Thus

f (x ) 0 xj f (x ) 0 xj f (x ) =0 xj min f (x) j J j Ju j J0 1T x = 1 x0

f (x ) = 1 1T x = 1 (x , ) 0

with feasibility x u

( )T x = 0

Optimality Conditions p. 75

Optimality Conditions p. 76

simplex. . .

f (x ) j = xj

(all equal). Thus, from complementarity, if x j > 0 then j = 0 f (x ) xj f (x ) xj

Given n assets with random returns R1 , . . . , Rn , how to invest 1 e in such a way that the resulting portfolio has minimum variance? If xj denotes the percentage of the investment on asset j , how to compute the variance of this portfolio P (x)?

Var

and

= ; otherwise

. Thus, if j : x j > 0, k

= E (P (x) (E (P (x))))2

n

f (x ) f (x ) xj xk

=E

j =1

=

i,j

= xT Qx

Optimality Conditions p. 77 Optimality Conditions p. 78

Problem (objective multiplied by 1/2 for simpler computations):

min(1/2)xT Qx 1 x=1 x0

T

Optimal portfolio

KKT: for all j : x j > 0:

Qij xj Qkj xj

j

Vector Qx might be thaught as the vector of marginal contributions to the total risk (which is a weighted sum of elements of Qx). Thus in the optimal portfolio, all assets with positive level give equal (and minimal) contribution to the total risk.

Optimality Conditions p. 79

Optimality Conditions p. 80

Fabio Schoen 2008

Most common form for optimization algorithms: Line search-based methods: Given a starting point x0 a sequence is generated:

xk+1 = xk + k dk

http://gol.dsi.unifi.it/users/schoen

where dk Rn : search direction, k > 0: step Usually rst dk is chosen and than the step is obtained, often from a 1dimensional optimization

Trust-region algorithms

A model m(x) and a condence region U (xk ) containing xk are dened. The new iterate is chosen as the solution of the constrained optimization problem

x U (x k )

Speed measures

Let x : local optimum. The error in xk might be measured e.g. as

e(xk ) = xk x

or

min m(x)

The model and the condence region are possibly updated at each iteration.

e(xk ) q k {xk } is linearly convergent, or converges with order 1; : convergence rate A sufcient condition for linear convergence: lim sup e(xk+1 ) e(xk )

Algorithms for unconstrained local optimization p. 4

superlinear convergence

If for every (0, 1) exists q :

e(xk ) q k

If, given p > 1, q > 0, (0, 1) :

e(xk ) q (p

k)

lim sup e(xk+1 ) =0 e(xk )

then {xk } is said to converge with order at least p If p = 2 quadratic convergence Sufcient condition:

lim sup e(xk+1 ) < e(xk )p

Examples

1 k

Examples

1 k 1 k2

Examples

1 k 1 k2

Examples

1 k 1 k2

Examples

1 k 1 k2

Let f C 1 (Rn ), xk Rn : f (xk ) = 0 Let d Rn . If

dT f (xk ) < 0

1 k 22

f (xk + d) f (xk ) = dT f (xk ) + o()

Thus if is small enough f (xk + d) f (xk ) < 0 NB: d might be a descent direction even if dT f (xk ) = 0

Algorithms for unconstrained local optimization p. 7 Algorithms for unconstrained local optimization p. 8

If a sequence xk+1 = xk + k dk is generated in such a way that:

L0 = {x : f (x) f (x0 )} is compact dk = 0 whenever f (xk ) = 0 f (xk+1 ) f (xk )

if dk = 0 then

|dT k f (xk )| ( f (xk ) ) dk

if f (xk ) = 0 k then

dT lim k f (xk ) = 0 k dk

such that f (xk Then either there exists a nite index k ) = 0 or otherwise xk L0 and all of its limit points are in L0 {f (xk )} admits a limit limk f (xk ) = 0 f (xk+1 ) f (xk ): most optimization methods choose dk as a descent direction. If dk is a descent direction, choosing k sufciently small ensures the validity of the assumption limk dk f (xk ) = 0: given a normalized direction dk , the k scalar product dk T f (xk ) is the directional derivative of f along dk : it is required that this goes to zero. This can be achieved through precise line searches (choosing the step so that f is minimized along dk )

|dT k f (x k )| ( f (xk ) dk T dk : dk f (xk ) < 0 then dT

dT k f (xk ) c dk f (xk

Algorithms for unconstrained local optimization p. 11 Algorithms for unconstrained local optimization p. 12

Gradient Algorithms

Recalling that

cos k = dT k f (xk ) dk f (xk

General scheme:

xk+1 = xk k Dk f (xk )

cos k c

dk = Dk f (xk )

that is, the angle between dk and f (xk ) is bounded away from orthogonality.

T dT k f (xk ) = f (xk )Dk f (xk )

<0

dT k f (xk )

Steepest Descent

or gradient method:

Dk := I

i.e. xk+1 = xk k f (xk ). If f (xk ) = 0 then dk = f (xk ) is a descent direction. Moreover, it is the steepest (w.r.t. the euclidean norm):

dRn

min T f (xk )d d 1

f (xk )

...

Newtons method

min T f (xk )d dT d 1 Dk := 2 f (xk )

1

dRn

1 f (x) f (xk ) + T f (xk )(x xk ) + (x xk )T 2 f (xk )(x xk ) 2

d f (xk ) + =0 d dT d = 1 0 d =

f (x k ) f (x k )

f (xk ) + 2 f (xk )(x xk ) = 0

x = xk 2 f (xk )

Algorithms for unconstrained local optimization p. 17

f (xk )

Algorithms for unconstrained local optimization p. 18

Step choice

Given dk , how to choose k so that xk+1 = xk + k dk ? optimal choice (one-dimensional optimization):

k = arg min f (xk + dk ).

0

Minimizing w.r.t. :

T dT k Qdk + (Qxk + c) dk = 0

= =

Analytical expression of the optimal step is available only in few cases. E.g. if f (x) = 1 xT Qx + cT x with Q 0. Then 2

1 f (xk + dk ) = (xk + dk )T Q(xk + dk ) + cT (xk + dk ) 2 1 T = 2 dT k Qdk + (Qxk + c) dk + 2

k = f (xk ) 2 T f (xk )2 f (xk )f (xk )

Rules for choosing a step-size (from the sufcient condition for convergence):

f (xk+1 ) < f (xk )

dT limk dk k

f (xk ) = 0

u u u

dT K f (xk + k dk ) 0 xk+1 xk 0

In general it is important to insure a sufcient reduction of f and a sufciently large step xk+1 xk

u

Armijos rule

Input:

u u u u

(0, 1), (0, 1/2), k > 0 := k ; while (f (xk + dk ) > f (xk ) + dT k f (xk )) do := ;

end return

Typical values : [0.1, 0.5], [104 , 103 ]. On exit the returned step is such that f (xk + dk ) f (xk ) + dT k f (xk )

How to choose the initial step size k ? Let () = f (xk + dk ). A possibility is to choose k = , the minimizer of a quadratic approximation to (). Example:

1 q () = c0 + c1 + c2 2 2 q (0) = c0 := f (xk ) q (0) = c1 := dT k f (xk ) dT k f (xk )

acceptable steps

Then = c1 /c2 .

dT k f (xk )

of the minimum of f (xk + dk ) Third condition? If an estimate f . is available choose c2 : min q () = f min q () = q (c1 /c2 ) = c0 c2 1 /c2 := f

k = 2 f (xk ) f dT k f (xk )

k

= c1 /c2 =2

c2 = c2 1 /2(f c0 ) c0 f c1

xk+1 = xk k f (xk )

Behaviour of the algorithm when minimizing

1 f (x) = xT Qx 2

If a sufciently accurate step size is used the condition of the theorem on global convergence are satised the steepest descent algorithm globally converges to a stationary point. Sufciently accurate means exact line search or, e.g., Armijos rule.

xk+1 = xk k f (xk ) = xk k Qxk = (I k Q)xk

xk+1 0 = (I k Q)xk

Algorithms for unconstrained local optimization p. 29

2 xT k (I k Q) xk

Analysis

Let A: symmetric with eigenvalues: 1 < < n . Then

1 v

2 T xT k (I k Q) xk xk xk 2

...

is an eigenvalue of A iff is an eigenvalue of A is an eigenvalue of A iff 1 + is an eigenvalue of I + A

v T Av m v

v Rn

1 i

max{(1 k 1 )2 , (1 k n )2 }

thus

xk+1

Algorithms for unconstrained local optimization p. 31

= max{|1 k 1 |, |1 k n |} xk

max{(1 k 1 )2 , (1 k n )2 } xk

Algorithms for unconstrained local optimization p. 32

...

Eliminating the dependency on k :

max{|1 1 |, |1 n |} =

...

0 and 1 n , 1 + 1 1 + n 1 1 1 n

max{1 1 , 1 + 1 , 1 n , 1 + n }

5 4 3 2 1 00 0.2 0.4

|1 1 | |1 n |

and thus

max{|1 k 1 |, |1 k n |} xk = max{1 1 , 1 + n }

Minimum point:

1 1 = 1 + n

0.6

0.8

i.e.

=

Algorithms for unconstrained local optimization p. 33

2 1 + n

Algorithms for unconstrained local optimization p. 34

Analysis

In the best possible case

xk+1 |1 1 | xk 2 = |1 1 | 1 + n n 1 = n + 1 1 = +1

Zigzagging

1 min (x2 + M y 2 ) 2 where M > 0. Optimum: x 0y = 0. Starting point: (M, 1). Iterates: xk xk xk+1 + = M yk yk yk+1

xk+1 yk+1 = M

M 1 k M +1 M 1 k M +1

where = n /1 : condition number of Q 1 (illconditioned problem) very slow convergence 1 very speed convergence

Algorithms for unconstrained local optimization p. 35

Zigzagging

Converegence is rapid if M 1 very slow and zigzagging if M 1 or M 1 10

Slow convergence and zigzagging are general phenomena (especially when the starting point is near the longest axes of the ellipsoidal level sets)

-5

-10

Algorithms for unconstrained local optimization p. 37

20

40

60

80

100

Newton-Raphson method: xk+1 = xk (2 f (xk )) x : local optimum. Taylor expansion of f :

f (x ) = 0

2 1

f (xk ). Let

Thus

x xk+1 = o( x xk )

i.e.

x xk+1 x xk

o( x xk ) x xk

0 = 2 f (xk )

1

= x xk+1 + o( x xk )

f (xk ) + (x xk ) + 2 f (xk )

o( x xk )

Let f C 2 (U (x , 1 )), where U : ball with radius 1 and center x ; let 2 f (x ) be nonsingular. Then: 1. > 0 : if x0 U (x , ) {xk } is well dened and converges to x at least superlinearly. 2. If > 0, L > 0, M > 0 :

2 f (x) 2 f (y ) L x y

Difculties

Many things might go wrong: at some iteration, 2 f (xk ) might be singular. For example: if xk belongs to a at region f (x) = constant. even if non singular, inversion 2 f (xk ) or, in any case, solving a linear system with coefcient matrix 2 f (xk ) is numerically unstable and computationally demanding there is no guarantee that 2 f (xk ) 0 Newton direction might not be a descent direction

and

(2 f (x))1 M

xk+1 x LM xk x 2

2

Algorithms for unconstrained local optimization p. 41 Algorithms for unconstrained local optimization p. 42

Difculties

Newtons method just tries to solve the system

f (xk ) = 0

Newtontype methods

line search variant: xk+1 = xk k (2 f (xk ))

1

and thus might very well be attracted towards a maximum the method lacks global convergence: it converges only if started near a local optimum

Modied Newton method: replace 2 f (xk ) by (2 f (xk ) + Dk ) where Dk is chosen so that 2 f (xk ) + Dk is positive denite

f (xk )

Quasi-Newton methods

Consider solving the nonlinear system f (x) = 0. Taylor expansion of the gradient:

f (xk ) f (xk+1 ) + 2 f (xk+1 )(xk xk+1 )

QuasiNewton equation

Let:

sk := xk+1 xk yk := f (xk+1 ) f (xk )

Bk+1 (xk+1 xk ) = f (xk+1 ) f (xk )

QuasiNewton equation: Bk+1 sk = yk . If Bk was the previous approximate hessian, we ask that 1. the variation between Bk and Bk+1 is small 2. nothing changes along directions which are normal to the step sk :

Bk z = Bk+1 z z : z T sk = 0

Choosing n 1 vectors z which are orthogonal to sk n2 linearly independent equations in n2 unknowns a unique solution.

Algorithms for unconstrained local optimization p. 45 Algorithms for unconstrained local optimization p. 46

Broyden updating

It can be shown that the unique solution is given by:

Bk+1 (yk Bk sk )sT k = Bk + sT s k k

proof

Bk+1 Bk = = k Bk sk )sT (Bs k sT s k k = (yk Bk sk )sT k sT s k k Bk )sk sT (B k sT s k k

T Trsk sT k sk sk sT k sk

min Bk B

B F

k = yk Bs

Bk ) (B Bk ) = (B

sk sT k Bk ) = (B sT k sk sT k sk Bk ) = (B sT s k k

TrX T X denotes

Unicity is a consequence of the strict convexity of the norm and the convexity of the feasible region.

Special situation: 1. the hessian matrix in optimization problems is symmetric; 2. in gradient methods, when we let xk+1 = xk (Bk+1 )1 f (xk ), it is desirable that Bk+1 be positive denite. Broydens update:

Bk+1 = Bk + (yk Bk sk )sT k sT s k k

Simmetry

Remedy: let C1 = Bk +

( y k B k sk ) sT k sT k sk

symmetrization:

1 T C2 = (C1 + C1 ) 2

C3 = C2 + (yk C2 sk )sT k sT k sk

PBS update

In the limit

Bk+1 = Bk +

T (sT (yk Bk sk ))sk sT (yk Bk sk )sT k k + sk (yk Bk sk ) + k T 2 sk sk (sT s ) k k

BFGS

Same ideas, but applied to the approximate inverse Hessian: Inverse QuasiNewton equation:

sk = Hk+1 yk

(PBS Powell-Broyden-Symmetric update). Imposing also hereditary positive deniteness, DFP (Davidon-Fletcher-Powell) is obtained:

Bk+1 = Bk + = I (yk yk sT k T yk sk

T Bk sk )yk

T Bk sk ))yk yk T (yk sk )2

+ yk (yk Bk sk ) + T yk sk

T sk yk T yk sk

(sT k (yk

Hk+1 =

T sk yk T yk sk

Hk I

yk sT k T yk sk

sk sT k T yk sk

Bk I

T yk yk T yk sk

BFGS method

xk+1 = xk k Hk f (xk ) Hk+1 = I

Possible defect of standard Newton method: the approximation becomes less and less precise if we move away from the current point. Long step bad approximation. Idea: constrained minimization of quadratic approximation:

xk+1 = arg

xk+1 xk k

sk sT k T yk sk

min

mk (x)

where

k > 0: parameter. First advantage (over pure Newton): the step is always denite (thanks to Weierstrasss theorem)

Let mk () a local model function. E.g. in Newton Trust Region methods,

1 mk (s) = f (xk ) + sT f (xk ) + sT 2 f (xk )s 2

How to choose and update the trust region radius k ? Given a step sk , let

k = f (xk ) f (xk + sk ) mk (0) mk (sk )

1 mk (s) = f (xk ) + sT f (xk ) + sT Bk s 2

the ratio between the actual reduction and the predicted reduction

Model updating

f (xk ) f (xk + sk ) k = mk (0) mk (sk )

for

Algorithm

Data:

The predicted reduction is always non negative; if k is small (surely if it is negative) the model and the function strongly disagree the step must be rejected and the trust region reduced if k 1 it is safe to expand the trust region

> 0, 0 (0, ) , [0, 1/4] k = 0, 1, . . . do Find the step sk and k minimizing the model in the trust region ; if k < 1/4 then k+1 = k /4 ;

else if

end

else end

How to nd

1 min f (xk )T s + sT Bk s s 2 s

Thus either s is in the interior of the ball with radius , in which case = 0 and we have the (quasi)-Newton step:

1 f (xk ) p = Bk

f (xk ) + Bk s + 2s = 0 ( s ) = 0

or s = and if > 0 then 2s = f (xk ) Bs = mk (s) s is parallel to the negtaive gradient of the model and normal to its contour lines.

Strategy to approximately solve the trust region subproblem. Find the Cauchy point: the minimizer of mk along the direction f (xk ) within the trust region. First nd the direction:

T ps k = arg min fk + f (xk ) p p

Finding ps k is easy: analytic solution:

ps k = f (xk ) k gk

For the step size k : If f (xk )T Bk f (xk ) 0 negative curvature direction largest possible step k = 1

f (xk ) 3 } k f (xk )T Bk f (xk )

p k

k = arg min mk ( ps k)

0

k = min{1,

ps k

Algorithms for unconstrained local optimization p. 61

Choosing the Cauchy point global but extremely slow convergence (similar to steepest descent). Usually an improved point is searched starting from the Cauchy one.

Algorithms for unconstrained local optimization p. 62

Pattern Search

For smooth optimization, but without knowledge of derivatives. Elementary idea: if x R2 is not a local minimum for f , then at least one of the directions e1 , e2 , e1 , e2 (moving towards E, N, W, S) forms an acute angle with f (x) is a descent direction. Direct search: explores all the direction in search of one which gives a descent.

Coordinate search

Let D = {ei } be the set of coordinate directions and their opposites

Data:

Pattern search

It is not necessary to explore 2n directions. It is sufcient that the set of directions forms a positive span, i.e. every v Rn should be expressible as a non negative linear combination of the vectors in the set. Formally, G is a generating set iff

v = 0 Rn g G : v T g > 0

k = 0, 0 an initial step length, x0 a starting point while is large enough do if f (xk + k d) < f (xk ) for some d D then xk+1 = xk + k d (step accepted) ;

else

k+1 = 0.5k ;

end

(G ) := min max

v =0 dG

k =k+1;

end

vT d v d

Examples

u u u u u u u u u u

Step Choice

xk + k dk if f (xk + k dk ) < f (xk ) (k )(success) x k

xk+1 =

otherwise (failure)

where (t) = o(t). We let In the rst case 0.19612, in the second = 0.5, in the third = 0.5 0.7017

k+1 = k k

where k 1 for successful iterations, k < 1 otherwise. Direct methods possess good convergence properties.

Nelder-Mead Simplex

Given a simplex S = {v1 , . . . , vn+1 } in Rn let vr the worst point: r = arg maxi {f (vi )}. Let C be the centroid of S \ {vr }:

C=

i=r

1: Reection

Check f (R): if it is intermediate, i.e. better than the worst and worse than the best, then accept the reection, i.e. discard the worst point in the simplex and replace it with R.

vi

The algorithm performs a sort of line search along the direction C vr . Let

R = C + (C vr ) be the a reection of the worst point along the direction. Let f best function value in the current simplex. Three cases might occur:

Algorithms for unconstrained local optimization p. 73 Algorithms for unconstrained local optimization p. 74

Reection step

2: improvement

if the trial step is an improvement: worst

f (R) < f = R + (R C ) then attempt an expansion: try to move R to R ) < f (R)) then accept the expansion and If successful (f (R discard the worst point. If unsuccessful, then accept R as a new point and discard the worst one.

reection

Expansion

3: contraction

If however the reected point R is worse than all points in the simplex (possibly except the worst vr ), than a contraction step is performed: if f (R) > f (vr ) (R is worse than all points in the simplex), add

worst

0.5(vr + C )

reection expansion

0.5(R + C )

Algorithms for unconstrained local optimization p. 78

Contraction

Nelder-Mead is not a direct search method (only a single direction at a time is explored) It is widely used by practitioners. However it may fail to converge to a local minimum. There are examples of strictly convex functions in R2 on which the method converges to a non-stationary point. The bad convergence properties are connected to the event that the ndimensional simplex degenerates into a lower dimensional space. Moreover the method has a strong tendency to generate directions which are almost normal to that of the gradient! Convergent variants of Nelder-Mead method do exists.

contraction

reection

worst

Implicit ltering

Let

f (x) = h(x) + w(x)

Implicit ltering

Data:

repeat

OuterIteration = false;

repeat

where h(x) is a smooth function, while w(x) can be considered as an additive, typically random, noise. The method performs a rough estimate of the gradient (nite difference with a large step) and proceeds with an Armijo line search. If unsuccessful, the step for nite differences is reduced.

compute f (xk ) and a nite difference estimate of f (xk ): k f (xk ) = [(f (xk + k ei ) f (xk k ei ))/2k ]

if

k f (xk ) k then OuterIteration = true Armijo: if successful accept the Armijo step; otherwise let OuterIteration = true

else

; ;

Algorithms for unconstrained local optimization p. 82

k = k + 1;

Algorithms for unconstrained local optimization p. 81

Convergence properties

If

2 h(x) is Lipschitz continuous

lim 2 k + (xk ; k ) =0 k

where

(x; ) = sup

z : z x

|w(x)|

unsuccessful Armijo steps occur at most a nite number of times then all limit points of {xk } are stationary

Algorithms for unconstrained local optimization p. 83

Fabio Schoen 2008

http://gol.dsi.unifi.it/users/schoen

FrankWolfe method

Let X : convex set. Consider the problem:

min f (x)

xX

FrankWolfe

If T f (xk )( xk xk ) = 0 then

T f (xk )d 0

Let xk X choosing a feasible direction dk corresponds to choosing a point x X : dk = x xk . Steepest descent choice:

min T f (xk )(x xk )

xX

for every feasible direction d rst order necessary conditions hold. Otherwise, letting dk = x k x, this is a descent direction along which a step k (0, 1] might be chosen according to Armijos rule.

(a linear objective with convex constraints, usually easy to solve). Let x k be an optimal solution of this problem.

Under mild conditions the method converges to a point satisfying rst order necessary conditions. However it is usually extremely slow (convergence may be sublinear) It might nd applications in very large scale problems in which solving the sub-problem for direction determination is very easy (e.g. when X is a polytope).

Generic iteration:

xk+1 = xk + k ( xk xk )

x k = [xk sk f (xk )]+

The method is slightly faster than Frank-Wolfe, with a linear convergence rate similar to that of (unconstrained) steepest descent. It might be applied if projection is relatively cheap, e.g. when the feasible set is a box. A point xk satises rst order necessary conditions dT f (xk ) 0 iff

xk = [xk sk f (xk )]+

Barrier Methods

min f (x) gj (x) 0 j = 1, . . . , r

Barrier Method

Let k 0 and x0 strictly feasible, i.e. gj (x0 ) < 0 j . Then let

xk = arg min (f (x) + k B (x)) n

xR

A Barrier is a continuous function which tends to + whenever x approaches the boundary of the feasible region. Examples of barrier functions:

B (x) =

j

Proposition: every limit point of {xk } is a global minimum of the constrained optimization problem

logaritmic barrier

B (x) =

j

invers barrier

Special case: a single constraint (might be generalized) Let x be a limit point of {xk } (a global minimum). If KKT conditions hold, then there exists a unique 0:

f ( x) + g ( x) = 0

...

If B (x) = (g (x)),

f (xk ) + k (g (xk ))g (xk ) = 0

lim k (g (xk ))g (xk ) = g ( x)

min f (x) + k B (x) g (x) < 0

if limk g (xk ) < 0 (g (xk ))g (xk ) K (nite) and Kk 0 if limk g (xk ) = 0 (thanks to the unicity of Lagrange multipliers),

= lim k (g (xk ))

k

satises

f (xk ) + k B (xk ) = 0

strong numeric instability: the condition number of the hessian matrix grows as k 0 need for an initial strictly feasible point x0 (partial) remedy: k is very slowly decreased and the solution of the k + 1th problem is obtained starting an unconstrained optimization from xk

Example

min(x 1)2 + (y 1)2 x+y 1 Logarithmic Barrier problem: min(x 1)2 + (y 1)2 k log(1 x y ) x+y1<0 Gradient: Stationary points x = y =

3 4

2(x 1) + 2(y 1) +

1+k 4

k 1xy k 1xy

Algorithms for constrained local optimization p. 14

min c x Ax = b x0

T

The starting point is usually associated with = and is the unique solution of

min

j

log xj Ax = b x>0

Logarithmic Barrier on x 0:

min c x

j T

log xj Ax = b x>0

The trajectory x() of solutions to the barrier problem is called the central path and leads to an optimal solution of the LP.

Penalty Methods

Penalized problem:

min f (x) + P (x)

(for equality constrained problems): let

P (x; ) = f (x) +

i

hi (x)2

min f (x) hi (x) = 0 i = 1, . . . , m

xk+1 = arg min P (x; k )

min f (x) +

i

hi (x)2

(found with an iterative method initialized at xk ); let k+1 > k , k := k + 1. If xk+1 is a global minimizer of P and k then every limit point of {xk } is a global optimum of the constrained problem.

Exact penalties

Exact penalties: there exists a penalty parameter value s.t. the optimal solution to the penalized problem is the optimal solution of the original one. 1 penalty function:

P1 (x; ) = f (x) +

i

Exact penalties

for inequality constrained problems:

min f (x) hi (x) = 0 gj (x) 0

|hi (x)|

P1 (x; ) = f (x)

i

|hi (x)| +

j

max(0, gj (x))

Given an equality constrained problem, reformulate it as:

1 min f (x) + h(x) 2 2 h(x) = 0

Motivation

1 min f (x) + h(x) x 2

2

+ T h(x)

x L (x, ) = f (x) +

i h(x) + h(x)h(x)

i

1 L(x; ) = f (x) + h(x) 2

2

= x L(x, ) + h(x)h(x)

2 2 xx L (x, ) = f (x) +

i

+ T h(x)

motivation . . .

Let (x , ) an optimal (primal and dual) solution. Necessarily: x L(x , ) = 0; moreover h(x ) = 0 thus

x L (x , ) = x L(x , ) + h(x )h(x ) =0 (x , ) is a stationary point for the augmented lagrangian.

motivation . . .

Observe that:

2 T 2 2 xx L (x, ) = xx L(x, ) + h(x) h(x) + h(x) h(x) T = 2 xx L(x, ) + h(x) h(x)

v T 2 xx L(x , )v > 0

v : v T h(x ) = 0,

...

Let v = 0 : v T h(x )= 0. Then

T T T T T 2 v T 2 xx L (x , )v = v xx L(x , )v + v h(x ) h(x )v T = v T 2 xx L(x , )v > 0

...

Let v = 0 : v T h(x )= 0. Then

T T T 2 T T v T 2 xx L (x , )v = v xx L(x , )v + v h(x ) h(x )v T T 2 = v T 2 xx L(x , )v + (v h(x ))

which might be negative. However > 0: if T v T 2 L ( x , ) v > 0 . xx Thus, if is large enough, the Hessian of the augmented lagrangian is positive denite and x is a (strict) local minimum of L (, )

Inequality constraints

Given the problem

min f (x) g (x) 0 min f (x) hi (x) = 0 gj (x) 0 i = 1, m j = 1, p

min f (x)

x,s

j = 1, p 1 min L (x, z ; , ) = min f (x) + T h(x) + h(x) 2 x,z 2 1 2 2 2 (gj (x) + zj ) + j (gj (x) + zj )+ 2 j j

gj (x) + s2 j = 0

...

Consider minimization with respect to z variables:

min

z j

...

Thus:

u j = max{0, j gj (x)}.

1 2 )+ j (gj (x) + zj 2

2 2 ) (gj (x) + zj j

= min

u 0 j

Substituting:

1 L (x; , ) = f (x) + T h(x) + h(x) 2 2 1 + max{0, j + gj (x)} 2 j 2 j

u j } j = max{0, u

u : j + (gj (x) + u j ) = 0

Algorithms for constrained local optimization p. 29

min f (x) hi (x) = 0

Jacobian of KKT system:

F (x, ) = 2 xx L(x; ) H (x) T H (x) 0

Idea: apply Newtons method to solve the KKT equations: Lagrangian function:

L(x; ) = f (x) +

i

Newton step:

xk+1 k+1 = dk xk + k k

i hi (x)

F [x; ] = f (x) + H (x) =0 H (x)

T

where

2 xx L(xk ; k ) H (xk ) T H (xk ) 0 dk k = f (xk ) H T (xk )k H (xk )

existence

The Newton step exists if the Jacobian of the constraint set H (xk ) has full row rank the Hessian 2 xx L(xk ; k ) is positive denite In this case the Newton step is the unique solution of

1 min f (xk ) + f (xk )T d + dT 2 xx L(xk ; k )d d 2 H (xk )d + H (xk ) = 0

KKT conditions:

T T 2 xx L(xk ; k )dk + H (xk )k + f (xk ) + H (xk )k = 0

Under the same conditions as before this QP has a unique solution dk with Lagrange multipliers k = k+1

Thus SQP can be seen as a method which

1 T 2 min L(xk , k ) + T x L(xk , k )d + d xx L(xk ; k )d d 2 H (xk )d + H (xk ) = 0

minimizes a quadratic approximation to the Lagrangian subject to a rst order approximation of the constraints.

KKT conditions:

2 xx L(xk ; k )d + f (xk ) + H (xk )k + H (xk )k = 0

Under the same conditions as before this QP has a unique solution dk with Lagrange multipliers k = k+1

Inequalities

If the original problem is

min f (x) hi (x) = 0 gj (x) 0

Filter Methods

Basic idea:

min f (x) g (x) 0

can be considered as a problem with two objectives: minimize f (x) minimize g (x) (the second objective has priority over the rst)

1 min fk + f (xk )T d + dT 2 xx L(xk , k )d d 2 T i hi (xk )p + hi (xk ) = 0 T j gj (xk )p + gj (xk ) 0

Filter

Given the problem

min f (x) gj (x) 0 j = 1, . . . , k

Let {fk , hk , k = 1, 2, . . .} the observed values of f and h at points x1 , x2 , . . .. A pair (fk , hk ) dominates a pair (f , h ) iff

fk f hk h

and

min f (x) min h(x)

where

h(x) =

j

max{gj (x), 0}

f (x)

1 min fk + L(xk ; k )T d + dT 2 xx L(xk ; k )d d 2 T j gj (xk )p + gj (xk ) 0 d

(the norm is used here in order to keep the problem a QP) Traditional (unconstrained) trust region methods: if the current step is a failure reduce the trust region eventually the step will become a pure gradient step convergence!

h(x)

Here diminishing the trust region radius might lead to infeasible QPs:

Filter methods

Data:

x0 : starting point, , k = 0

else

T j gj (xk )p + gj (xk ) 0

Solve QP and get a step dk ; try setting xk+1 = xk + dk ; if (fk+1 , hk+1 ) is acceptable to the lter then Accept xk+1 and add (fk+1 , hk+1 ) to the lter; Remove dominated points from the lter; Possibly increase ;

else

gj (x) 0

xk

end end

set k = k + 1;

Algorithms for constrained local optimization p. 43

end

f (x)

h(x)

Fabio Schoen 2008

x S Rn

min f (x)

xS R

http://gol.dsi.unifi.it/users/schoen

and

x = arg min f (x) : f (x ) f (x) x S

This denition in unsatisfactory: the problem is ill posed in x (two objective functions which differ only slightly might have global optima which are arbitrarily far) it is however well posed in the optimal values: ||f g || |f g |

Quite often we are satised in looking for f and search one or more feasible solutions suche that

f ( x) f (x ) +

the problem is highly relevant, especially in applications the problem is very hard (perhaps too much) to solve there are plenty of publications on global optimization algorithms for specic problem classes there are only relatively few papers with relevant theoretical contents often from elegant theories, weak algorithms have been produced and viceversa, the best computational methods often lack a sound theoretical support many global optimization papers get published on applied research journals Bazaraa, Sherali, Shetty Nonlinear Programming: theory and algorithms, 1993: the word global optimum appears for the rst time on page 99, the second time at page 132, then at page 247: A desirable property of an algorithm for solving [an optimization] problem is that it generates a sequence of points converging to a global optimal solution. In many cases however we may have to be satised with less favorable outcomes. after this (in 638 pages) it never appears anymore. Global optimization is never cited.

Complexity

Similar situation in Bertsekas, Nonlinear Programming (1999): 777 pages, but only the denition of global minima and maxima is given! Nocedal & Wrigth, Numerical Optimization, 2nd edition, 2006: Global solutions are needed in some applications, but for many problems they are difcult to recognize and even more difcult to locate ... many successful global optimization algorithms require the solution of many local optimization problems, to which the algorithms described in this book can be applied Global optimization is hopeless: without global information no algorithm will nd a certiable global optimum unless it generates a dense sample. There exists a rigorous denition of global information some examples: number of local optima global optimum value for global optimization problems over a box, (an upper bound on) the Lipschitz constant

|f (y ) f (x)| L x y x, y

Introduction to Global Optimization p. 7

an explicit representation of the objective function as the difference between two convex functions (+ convexity of the

Complexity

Global optimization is computationally intractable also according to classical complexity theory. Special cases: Quadratic programming:

1 min xT Qx + cT x lAxu 2

max x b Ax c

is N P hard [Sahni, 1974] and, when considered as a decision problem, N P -complete [Vavasis, 1990].

Quadratic optimization on a hyper-rectangle (A = I ) when even only one eigenvalue of Q is negative quadratic minimization over a simplex

1 min xT Qx + cT x x 0 2 xj = 1

j

concave minimization quantity discounts, scale economies xed charge combinatorial optimization - binary linear programming:

min cT x + KxT (1 x) Ax = b x [0, 1]

or:

min cT x Ax = b x [0, 1]

Introduction to Global Optimization p. 11

Minimization of cost functions which are neither convex nor concave. E.g.: nding the minimum conformation of complex molecules Lennard-Jones micro-cluster, protein folding, protein-ligand docking, Example: Lennard-Jones: pair potential due to two atoms at X1 , X2 R3 : 1 2 v (r) = 12 6 r r where r = X1 X2 . The total energy of a cluster of N atoms located at X1 , . . . , XN R3 is dened as:

i=1,...,N j<i

v (||Xi Xj ||)

x (1 x) = 0

This function has a number of local (non global) minima which grows like exp(N )

Lennard-Jones potential

3 2 1 0 -1 -2 attractive(x) repulsive(x) lennard-jones(x)

Potential energy model:E = El + Ea + Ed + Ev + Ee where:

El =

i L

1 b 0 2 K (ri ri ) 2 i

Ea =

i A

1 0 2 K (i i ) 2 i

Ed = 1 K [1 + cos(ni )] 2 i

-3 0.5

i T

1.5

2.5

3.5

4.5

(dihedrals)

Introduction to Global Optimization p. 14

Docking

Ev =

(i,j) C

Ee = 1 2

(i,j) C

Given two macro-molecules M1 , M2 , nd their minimal energy coupling If no bonds are changed to nd the optimal docking it is sufcient to minimized:

Ev + Ee =

iM1 ,j M2

1 2 iM

1 ,j M2

qi qj Rij

(Coulomb interaction)

Two main families: 1. with global information (structured problems) 2. without global information (unstructured problems) Structured problems stochastic and deterministic methods Unstructured problems typically stochastic algorithms Every global optimization method should try to nd a balance between exploration of the feasible region approximations of the optimum

N 1 N

i=1 j =i+1

1 Xi Xj

12

2 Xi Xj

This is a highly structured problem. But is it easy/convenient to use its structure? And how?

LJ

The map

F1 : R3N R+ F1 (X1 , . . . , XN )

N (N 1)/2 2

X1 X2 2 , . . . , XN 1 XN

N (N 1)/2

NB: every C 2 function is d.c., but often its d.c. decomposition is not known. D.C. optimization is very elegant, there exists a nice duality theory, but algorithms are typically very inefcient.

F2 : R+ R 1 2 6 rij 1 3 rij

F2 (r12 , . . . , rN 1,N )

is the difference between two convex functions. Thus LJ (X ) can be seen as the difference between two convex function (a d.c. programming problem)

cutting plane method (just an example, not particularly efcient, useless for high dimensional problems). Any unconstrained d.c. problem can be represented as an equivalent problem with linear objective, a convex constraint and a reverse convex constraint. If g, h ar convex, then min g (x) h(x) is equivalent to:

min z g (x) h(x) z

min cT x g (x) 0 h(x) 0

= {x : g (x) 0} C = {x : h(x)0}

which is equivalent to

min z g (x) w h(x) + z w

Introduction to Global Optimization p. 21

Hp:

0 int intC, cT x > 0x \ intC

Fundamental property: if a D.C. problem admits an optimum, at least one optimum belongs to

C

Introduction to Global Optimization p. 22

4

g (0) < 0, h(0) < 0, cT x > 0 feasible x. Let x be a solution to the convex problem min cT x g (x) 0

cT x = 0

If h( x) 0 then x solves the d.c. problem. Otherwise cT x > cT x for all feasible x. Coordinate transformation: y = x x :

min cT y (y ) 0 h g (y ) 0

T

-1

-2

where g (y ) = g (y + x ). Then c y > 0 for all feasible solutions (0) > 0; by continuity it is possible to choose x and h so that g (0) < 0.

Introduction to Global Optimization p. 23

-3

-4 -9 -8 -7 -6 -5 -4 -3 -2 -1

Let x best known solution. Let D( x) = { x : c T x c T x } If D( x) C then x is optimal; Check: a polytope P (with known vertices) is built which contains D( x) If all vertices of P are in C optimal solution. Otherwise let v : best feasible vertex; the intersection of the segment [0, v ] with C (if feasible) is an improving point x. Otherwise a cut is introduced in P which is tangent to in x.

D( x ) = {x : c T x c T x } cT x = 0

-1

x

-2

-3

-4 -9 -8 -7 -6 -5 -4 -3 -2 -1

Initialization

4

P D( x)

3

i.e.

y : cT y cT x y y P

feasible

0

-1

x

-2

-3

V

Introduction to Global Optimization p. 27

-4 -9 -8 -7 -6 -5 -4 -3 -2 -1

Step 1

Let V the vertex with largest h() value. Surely h(V ) > 0 (otherwise we stop with an optimal solution) Moreover: h(0) < 0 (0 is in the interior of C ). Thus the line from V to 0 must intersect the boundary of C Let xk be the intersection point. It might be feasible (improving) or not.

4

xk = C [V , 0] cT x = 0

C

xk

-1

x

-2

-3

V

Introduction to Global Optimization p. 29

-4 -9 -8 -7 -6 -5 -4 -3 -2 -1

If xk , set x := xk

cT x = 0

cT x = 0

-1

-1

x

-2 -2

-3

-3

-4 -9 -8 -7 -6 -5 -4 -3 -2 -1

-4 3 -9 -8 -7 -6 -5 -4 -3 -2 -1

4

cT x = 0

xS

2

-1

-2

is the Fenchel-Rockafellar dual. If min g (x) h(x) admits an optimum, then Fenchel dual is a strong dual.

-3

-4 -9 -8 -7 -6 -5 -4 -3 -2 -1

A primal/dual algorithm

If x arg min g (x) h(x) then

u h(x ) Pk : min g (x) (h(xk ) + (x xk )T yk )

and

Dk : min h (y ) (g (yk1 ) + xT k (y yk1 )

( denotes subdifferential) is dual optimal and if u arg min h (u) g (u) then

x g (u )

GlobOpt - relaxations

Consider the global optimization problem (P):

min f (x)

xX

and assume the min exists and is nite and that we can use a relaxation (R):

min g (y ) yY

Usually both X and Y are subsets of the same space Rn . Recall: (R) is a relaxation of (P) iff:

XY

Introduction to Global Optimization p. 36

1. Solve the relaxation (R) and let L be the (global) optimum value (assume it is feasible for (R)) 2. (Heuristically) solve the original problem (P) (or, more generally, nd a good feasible solution to (P) in X ). Let U be the best feasible function value known 4. otherwise split X and Y into two parts and apply to each of them the same method 3. if U L then stop: U is a certied optimum for (P)

Tools

good relaxations: easy yet accurate good upper bounding, i.e., good heuristics for (P) Good relaxations can be obtained, e.g., through: convex relaxations domain reduction

Convex relaxations

Assume X is convex and Y = X . If g is the convex envelop of f on X , then solving the convex relaxation (R), in one step gives the certied global optimum for (P). g (x) is a convex under-estimator of f on X if:

g (x)is convex g (x) f (x) g is the convex envelop of f on X if: g is a convex under-estimator off g (x) h(x) h : convex under-estimator of f

Introduction to Global Optimization p. 40

A 1-D example

x X

x X

Convex under-estimator

Branching

Bounding

Let

min f (x)

xS

be a GlobOpt problem where f is convex, while S is non convex. A relaxation (outer approximation) is obtained replacing S with a larger set Q. If Q is convex convex optimization problem. If the optimal solution to

Upper bound

min f (x)

xQ

fathomed

lower bounds

Introduction to Global Optimization p. 44 Introduction to Global Optimization p. 45

Example

min x 2y xy 3

Relaxation

min x 2y xy 3

x[0,5],y [0,3]

x[0,5],y [0,3]

4 3 2

We know that:

(x + y )2 = x2 + y 2 + 2xy

thus

1 0 0 1 2 3 4 5 6

xy = ((x + y )2 x2 y 2 )/2

Introduction to Global Optimization p. 46

(x + y )2 5x 3y 6

Relaxation

4 3 2 1 0 0 1 2 3 4 5 6

Stronger Relaxation

min x 2y xy 3

x[0,5],y [0,3]

Thus:

(5 x)(3 y ) 0 xy 3x + 5y 15

15 3x 5y + xy 0

3x + 5y 15 3

i.e.: 3x + 5y 18

Relaxation

4 3 2 1 0 0 1 2 3 4 5 6

How to build convex envelopes of a function or how to relax a non convex constraint? Convex envelopes lower bounds Convex envelopes of f (x) upper bounds Constraint: g (x) 0 if h(x) is a convex underestimator of g then h(x) 0 is a convex relaxations. Constraint: g (x) 0 if h(x) is concave and h(x) g (x), then h(x) 0 is a convex constraint

The optimal solution of the convex (linear) relaxation is (1, 3) which is feasible optimal for the original problem

Convex envelopes

Denition: a function is polyhedral if it is the pointwise maximum of a nite number of linear functions. (NB: in general, the convex envelope is the pointwise supremum of afne minorants) The generating set X of a function f over a convex set P is the set

Generating sets

I.e., given f we rst build its convex envelop in P and then dene its epigraph {(x, y ) : x P, y f (x)}. This is a convex set whose extreme points can be denoted by V . X are the x coordinates of V

Characterization

Let f (x) be continuously differentiable in a polytope P . The convex envelope of f on P is polyhedral if and only if

X (f ) = Vert(P )

(the generating set is the vertex set of P ) Corollary: let f1 , . . . , fm C 1 (P ) and i fi (x) possess polyhedral convex envelopes on P . Then

Conv(

i

fi (x)) =

i i

Convfi (x)

Conv(fi (x))

is Vert(P )

Characterization

If a f (x) is such that Convf (x) is polyhedral, than an afne function h(x) such that 1. h(x) f (x) for all x Vert(P ) 2. there exist n + 1 afnely independent vertices of P , V1 , . . . , Vn+1 such that

f (Vi ) = h(Vi ) i = 1, . . . , n + 1

Characterization

The condition may be reversed: given m afne functions h1 , . . . , hm such that, for each of them 1. hj (x) f (x) for all x Vert(P ) 2. there exist n + 1 afnely independent vertices of P , V1 , . . . , Vn+1 such that

f (Vi ) = hj (Vi ) i = 1, . . . , n + 1

h(x) = convf (x)

Then the function (x) = maxj j (x) is the convex envelope of a polyhedral function f iff the generating set of is Vert(P) for every vertex Vi we have (Vi ) = f (Vi )

Sufcient condition

If f (x) is lower semi-continuous in P and for all x Vert(P ) there exists a line x : x interior of P x and f (x) is concave in a neighborhood of x on x , then Convf (x) is polyhedral Application: let

f (x) =

i,j

(Al-Khayyal, Falk (1983)): let x [x , ux ], y [y , uy ]. Then the convex envelope of xy in [x , ux ] [y , uy is

(x, y ) = max{y x + x y x y ; uy x + ux y ux uy }

(x x )(y y ) 0

ij xi xj

The sufcient condition holds for f in [0, 1]n bilinear forms are polyhedral in an hypercube

xy y x + x y x y

Bilinear terms

xy (x, y ) = max{y x + x y x y ; uy x + ux y ux uy } No other (polyhedral) function underestimating xy is tighter. In fact y x + x y x y belongs to the convex envelope: it underestimates xy and coincides with xy at 3 vertices ((x , y ), (x , uy ), (ux , y )). Analogously for the other afne function. All vertices are interpolated by these 2 underestimating hyperplanes they form the convex envelop of xy

Of course no! Many things can go wrong . . . It is true that, on the hypercube, a bilinear form:

ij xi xj

i<j

is polyhedral (easy to see) but we cannot guarantee in general that the generating set of the envelope are the vertices of the hypercube! (in particular, if s have opposite signs) if the set is not an hypercube, even a bilinear term might be non polyhedral: e.g. xy on the triangle {0 x y 1}

Finding the (polyhedral) convex envelope of a bilinear form on a generic polytope P is NPhard!

Introduction to Global Optimization p. 60 Introduction to Global Optimization p. 61

Fractional terms

A convex underestimate of a fractional term x/y over a box can be obtained through

w x /y + x/uy x /uy

if x if x if x if x

If f (x), x [x , ux ], is concave, then the convex envelope is simply its linear interpolation at the extremes of the interval:

f (x ) + f (ux ) f (x ) (x x ) ux x

0 <0 0 <0

Let f (x) C 2 be general non convex. Than a convex underestimate on a box can be dened as

n

How to choose i s? One possibility: uniform choice: i = . In this case convexity of is obtained iff

max 0, 1 min min (x) 2 x[,u]

(x) = f (x)

i=1

i (xi i )(ui xi )

2 (x) = 2 f (x) + 2diag() is convex iff 2 (x) is positive semi-denite.

Key properties

(x) f (x) is convex interpolates f at all vertices of [, u]

Estimation of

U Compute an interval Hessian [H ] : [H (x)]ij = [hL ij (x), hij (x)] in [, u] Find such that [H ] + 2diag() 0. Gerschgorin theorem for real matrices:

Maximum separation:

1 max(f (x) (x)) = 4 (ui i )2 min min hii

i j =i

|hij |

min min hL ii

i U max{|hL ij |, |hij |}

j =i

uj j ui i

Improvements

new relaxation functions (other than quadratic). Example

n

Techniques for cutting the feasible region without cutting the global optimum solution. Simplest approaches: feasibility-based and optimality-based range reduction (RR). Let the problem be:

min f (x)

xS

(x; ) =

i=1

gives a tighter underestimate than the quadratic function partitioning: partition the domain into a small number of regions (hyper-rectangules); evaluate a convex underestimator in each region; join the underestimators to form a single convex function in the whole domain

i = min xi xS ui = max xi xS

for all i 1, . . . , n and then adding the constraints x [, u] to the problem (or to the sub-problems generated during Branch & Bound)

Introduction to Global Optimization p. 68 Introduction to Global Optimization p. 69

Feasibility Based RR

If S is a polyhedron, RR requires the solution of LPs:

[ , u ] = min / max x Ax b x [L, U ]

j

Optimality Based RR

Given an incumbent solution x S , ranges are updated by solving the sequence:

i = min xi f (x) f ( x) aij xj bi xS ui = max xi f (x) f ( x) xS

Poor mans L.P. based RR: from every constraint in which ai > 0 then

x x 1 bi ai 1 bi ai aij xj

j =

where f (x) is a convex underestimate of f in the current domain. RR can be applied iteratively (i.e., at the end of a complete RR sequence, we might start a new one using the new bounds)

j =

min{aij Lj , aij Uj }

Introduction to Global Optimization p. 70 Introduction to Global Optimization p. 71

generalization

Let

min f (x)

xX

R.H.S. perturbation

(P )

(y ) = min f (x)

xX

(Ry )

g (x) 0

min f (x)

xX

g (x) y (R)

g (x) 0

be a perturbation of (R). (R) convex (Ry ) convex for any y . Let x : an optimal solution of (R) and assume that the ith constraint is active:

g ( x) = 0

and

be a convex relaxation of (P ):

: g (x) 0} {x X : g (x) 0} {x X f (x) f (x) x X : g (x) 0

Duality

Assume (R) has a nite optimum at x with value (0) and Lagrange multipliers . Then the hyperplane

H (y ) = (0) T y

Main result

If (R) is convex with optimum value (0), constraint i is active at the optimum and the Lagrange multiplier is i > 0 then, if U is an upper bound for the original problem (P ) the constraint:

g i (x) (U L)/i

(y ) (0) T y y Rm

(where L = (0)) is valid for the original problem (P ), i.e. it does not exclude any feasible solution with value better than U .

proof

Problem (Ry ) can be seen as a convex relaxation of the perturbed non convex problem

(y ) = min f (x)

xX

Applications

Range reduction: let x [, u] in the convex relaxed problem. If variable xi is at its upper bound in the optimal solution, them we can deduce

xi max{i , ui (U L)/i }

g (x) y

and thus (y ) (y ). Thus underestimating (Ry ) produces an underestimate of (y ). Let y := ei yi ; From duality: L T ei yi (ei yi ) (ei yi ) If yi < 0 then U is an upper bound also for (ei yi ), thus L i yi U . But if yi < 0 then constraint i is active. For any feasible x there exists a yi < 0 such that g (x) yi is active we may substitute yi with g i (x) and deduce L i g i (x) U

Introduction to Global Optimization p. 76

where i is the optimal multiplier associated to the ith upper bound. Analogously for active lower bounds:

xi min{ui , i + (U L)/i }

Let the constraint

aT i x bi

f (x) = F (x; )

be active in an optimal solution of the convex relaxation (R). Then we can deduce the valid inequality

ai T x bi (U L)/i

L(x1 , ..., xn ; ) = min F (xi ; ) min F (x; )

i=1,n x

and the next point to sample is placed in order to minimize the expected loss (or risk)

xn+1 = arg min E (L(x1 , ..., xn , xn+1 ) | x1 , ..., xn ) = arg min E (min(F (xn+1 ; ) F (x; )) | x1 , ..., xn )

Introduction to Global Optimization p. 78 Introduction to Global Optimization p. 79

Given k observations (x1 , f1 ), . . . , (xk , fk ), an interpolant is built:

n

Bumpiness

Let fk an estimate of the value of the global optimum after k observations. Let sy k the (unique) interpolant of the data points

s(x) =

i=1

i ( x xi ) + p(x)

(xi , fi )i = 1, . . . , k

) (y, fk

p: polynomial of a (prexed) small degree m. : radial function like, e.g.: (r) = r (r) = r

3

Idea: the most likely location of y is such that the resulting interpolant has minimum bumpiness Bumpiness measure:

(sk ) = (1)m+1 i sy k (xi )

2

Polynomial p is necessary to guarantee existence of a unique interpolant (i.e. when the matrix {ij = ( xi xj )} is singular)

TO BE DONE

Stochastic methods

Pure Random Search - random uniform sampling over the feasible region Best start: like Pure Random Search, but a local search is started from the best observation Multistart: Local searches started from randomly generated starting points

3 2

3 2

+

1

+ + + + + +

-1 -2

+

0

+ + + +

0 -1 -2 -3

-3

Clustering methods

Given a uniform sample, evaluate the objective function Sample Transformation (or concentration): either a fraction of worst points are discarded, or a few steps of a gradient method are performed Remaining points are clustered from the best point in each cluster a single local search is started

Uniform sample

5

3 5 0 1

4 3 2 1 0

Sample concentration

5

3 5 0 1

Clustering

5

3 5 0 1

4 3 2 1 0 5 0 1 2 3

4 3 2 1 0

+ ++ + + + + + + + + + +

+ +

Local optimization

5

3 5 0 1

Clustering: MLSL

Sampling proceed in batches of N points. Given sample points X1 , . . . , Xk [0, 1]n , label Xj as clustered iff Y X1 , . . . , Xk :

1 ||Xj Y || k := 2 n log k 1 + k 2

1 n

4 3 and 2 1 0

f (Y ) f (Xj )

Simple Linkage

A sequential sample is generated (batches consist of a single observation). A local search is started only from the last sampled point (i.e. there is no recall) unless there exists a sufciently near sampled point with better function valure

Smoothing methods

Given f : Rn R, the Gaussian transform is dened as:

f (x) = 1 n/2 n

Rn

f (y ) exp y x 2 /2

When is sufciently large f is convex. Idea: starting with a large enough , minimize the smoothed function and slowly decrease towards 0.

Smoothing methods

Introduction to Global Optimization p. 94

Introduction to Global Optimization p. 95

2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 10 5 -10 -5 0 5 10 -10 -5

Introduction to Global Optimization p. 96

Introduction to Global Optimization p. 97

Elementary idea: local optimization smooths out many high frequency oscillations

Introduction to Global Optimization p. 98 Introduction to Global Optimization p. 99

10

10

0 0

10

Monotonic Basin-Hopping

9 8

k := 0; f := +; while k < M axIter do Xk : random initial solution Xk = arg min f (x; Xk ); (local minimization started at Xk ) fk = f (Xk ); if fk < f = f := fk N oImprove := 0; while N oImprove < M axImprove do X = random perturbation of Xk Y = arg minf (x; X ) ; if f (Y ) < f = Xk := Y ; N oImprove := 0; f := f (Y ) otherwise N oImprove + + end while end while

Introduction to Global Optimization p. 102 Introduction to Global Optimization p. 103

10

10

0 0

10

10

0 0

10

0

References

In this years course the global optimization part has been expanded, so it is possible that some part in nonlinear optimization will be skipped. Here is an essential reference list for the material covered during the course:

Mokhtar S. Bazaraa, John J. Jarvis and Hanif D. Sherali, Linear Programming and Network Flows, John Wiley & Sons, 1990. Dimitri P. Bertsekas, Nonlinear Programming, Athena Scientic, 1999. Jorge Nocedal and Stephen J. Wright, Numerical Optimization, Springer, 2006. Mohit Tawarmalani and Nikolaos V. Sahinidis, A Polyhedral Branchand Cut Approach to Global Optimization, in: Mathematical Programming, volume 103, pages 225-249, 2005. Androulakis I.P., C.D. Maranas, and C.A. Floudas (PostScript (184K), PDF (154K)), BB : A Global Optimization Method for General Constrained Nonconvex Problems, Journal of Global Optimization, 7, 4, pp. 337-363(1995). A. Rikun. A convex envelope formula for multilinear functions. Journal of Global Optimization, pages 10:425437, 1997. Andrea Grosso, Marco Locatelli and Fabio Schoen, A Population Based Approach for Hard Global Optimization Problems Based on Dissimilarity Measures, in: Mathematical Programming, volume 110, number 2, pages 373-404, 2007.

- Linear Programming I.pdfDiunggah olehManoj Sharma
- PSM010_2Diunggah olehAsmZziz Oo
- ADS - A Fortran Program for Automated Design SynthesisDiunggah olehdavebrackett
- Convex Optimization and ApplicationsDiunggah olehBehzad Samadi
- 59.317549Diunggah olehZaheer Abbas
- tao.pdfDiunggah olehSalman Shah
- A novel method for solving the fully neutrosophic linear programming problemsDiunggah olehAnonymous 0U9j6BLllB
- Optimization-in-ChemicalEngineering_TOC.pdfDiunggah olehElder Puello
- NLSC Lecture Notes 2016 Sufficient Conditions ChapterDiunggah olehsalim
- Chapter 2Diunggah olehFickri Hafriz
- Mathematical Mode 00 MccoDiunggah olehSaurabh Kumar Nirwal
- Multifidelity-constrained-optimization-March-Willcox.pdfDiunggah olehamcm25
- 01707402.pdfDiunggah olehBasil
- A New method for Solving Fully Fuzzy Linear Programming ProblemDiunggah olehVeronica George
- Support Vector MachinesDiunggah olehIrina Alexandra
- reservoirDiunggah olehDeepak Chandran
- Opf With Enhanced GADiunggah olehbhheart
- CPLEX11_2manualsDiunggah olehGiovanni Pedroncelli
- Maze Bridges2013 119Diunggah olehDavid Berge
- Ch13 - NLP,DP,GP2005Diunggah olehYusril Kurniawan
- Duality OptDiunggah olehSidney Lins
- DualityDiunggah olehjgkonnully
- MPC cheatsheetDiunggah olehMarcelloFiducioso
- art%3A10.1007%2FBF01262977.pdfDiunggah olehFan Zhang
- An Efficient Method of Solving Lexicographic Linear Goal Programming ProblemDiunggah olehAlexander Decker
- OC CoordinationDiunggah olehSoumikDas
- Dissertation OnlineDiunggah olehAndreea
- 10.1.1.8Diunggah olehwilliamnuevo
- lec30Diunggah olehSaid
- VOLTAGE PROFILE IMPROVEMENT USING SVCDiunggah olehkubera u

- quizday1 (1)Diunggah olehNhana
- Service Orchestration for Flexible Manufacturing Systems using.pdfDiunggah olehTiago Oliveira
- Java Material-Interview PurposeDiunggah olehRaj Nayak
- OopsDiunggah olehapurva123r4
- VBScripting Manual- Rev 2013Diunggah olehbasamvenkat
- Java StringsDiunggah olehsimargrewal
- Writing SQL Queries Let's Start With the BasicsDiunggah olehLarry Solomon
- 11 i 10 at Form PersonalizationDiunggah olehnareshreddyguntaka
- sdccmanDiunggah olehidalmir
- s7sfcs7b eDiunggah olehLâm Lê
- Code2flow - Online Interactive Code to Flowchart ConverterDiunggah olehJnaneswar Reddy Sabbella
- Manual ScriptingDiunggah olehMohamed Traore
- VHDL Language ReferenceDiunggah olehvikrant99
- JavaDiunggah olehJPR EEE
- Algorithm SampleDiunggah olehTiatemsu
- Simpleisbetterthancomplex.com-How to Render Django Form ManuallyDiunggah olehMichael Balosh
- vbfilepart1Diunggah olehdangwalpr
- CourseHandOut_Data Structures and Algorithms DesignDiunggah olehASIF VORA
- ABAP Programming StandardsDiunggah olehshaileshk7677
- Computer Science Fall 2011 SyllabusDiunggah olehMatt Carlberg
- OpenGL Basic FunctionsDiunggah olehAishwaryaNawkar
- F15ISEN622ProjectDiunggah olehPallav Anand
- SQL Loader BasicsDiunggah olehLuis Patricio Moreno Mella
- MessageFlow_FileExtenderNodeGuideDiunggah olehshyam
- Hierarchy EngDiunggah olehYahyaoui Jassem
- Adobe Introduction to ScriptingDiunggah olehDragon1700
- Memory ManagementDiunggah olehSaroj Misra
- OpenACC 2.5Diunggah olehyeongnjingwak
- Intro_To_OpenMP_Mattson.pdfDiunggah olehMiguel Angel
- Column GenerationDiunggah olehbscjjw