L. T. Biegler
Chemical Engineering Department
Carnegie Mellon University
Pittsburgh, PA
Unconstrained Optimization
Algorithms
Newton Methods
Quasi-Newton Methods
Constrained Optimization
Karush Kuhn-Tucker Conditions
Special Classes of Optimization Problems
Reduced Gradient Methods (GRG2, CONOPT, MINOS)
Successive Quadratic Programming (SQP)
Interior Point Methods (IPOPT)
Process Optimization
Black Box Optimization
Modular Flowsheet Optimization Infeasible Path
The Role of Exact Derivatives
Further Applications
Sensitivity Analysis for NLP Solutions
Multi-Scenario Optimization Problems
1
Introduction
Optimization: given a system or process, find the best solution to
this process within constraints.
Objective Function: indicator of "goodness" of solution, e.g., cost,
yield, profit, etc.
Decision Variables: variables that influence process behavior and
can be adjusted for optimization.
In many cases, this task is done by trial and error (through case
study). Here, we are interested in a systematic approach to this
task - and to make this task as efficient as possible.
Optimization Viewpoints
2
Optimization Literature
Engineering
1. Edgar, T.F., D.M. Himmelblau, and L. S. Lasdon, Optimization of Chemical
Processes, McGraw-Hill, 2001.
2. Papalambros, P. and D. Wilde, Principles of Optimal Design. Cambridge Press,
1988.
3. Reklaitis, G., A. Ravindran, and K. Ragsdell, Engineering Optimization, Wiley, 1983.
4. Biegler, L. T., I. E. Grossmann and A. Westerberg, Systematic Methods of Chemical
Process Design, Prentice Hall, 1997.
5. Biegler, L. T., Nonlinear Programming: Concepts, Algorithms and Applications to
Chemical Engineering, SIAM, 2010.
Numerical Analysis
1. Dennis, J.E. and R. Schnabel, Numerical Methods of Unconstrained Optimization,
Prentice-Hall, (1983), SIAM (1995)
2. Fletcher, R. Practical Methods of Optimization, Wiley, 1987.
3. Gill, P.E, W. Murray and M. Wright, Practical Optimization, Academic Press, 1981.
4. Nocedal, J. and S. Wright, Numerical Optimization, Springer, 2007
5
Motivation
Scope of optimization
Provide systematic framework for searching among a specified
space of alternatives to identify an optimal design, i.e., as a
decision-making tool
Premise
Conceptual formulation of optimal product and process design
corresponds to a mathematical programming problem
min f(x, y)
s.t. h(x, y) = 0 MINLP NLP
g(x, y) 0
x Rnx, x {0, 1}ny
6
3
Optimization in Design, Operations and Control
MILP MINLP Global LP,QP NLP SA/GA
HENS x x x x x x
MENS x x x x x x
Separations x x
Reactors x x x x
Equipment x x x
Design
Flowsheeting x x
Scheduling x x x x
Supply Chain x x x
Real-time x x
optimization
Linear MPC x
Nonlinear x x
MPC
Hybrid x x
==> L/D = CT/CS
Note:
-
What if L cannot be eliminated in (1) explicitly? (strange shape)
-
What if D cannot be extracted from (2)?
(cost correlation implicit)
8
4
Unconstrained Multivariable Optimization
Nonsmooth Functions
- Direct Search Methods
- Statistical/Random Methods
Smooth Functions
- 1st Order Methods
- Newton Type Methods
- Conjugate Gradients
5
Local vs. Global Solutions
Convexity Definitions
a set (region) X is convex, if and only if it satisfies:
y + (1-)z X
for all , 0 1, for all points y and z in X.
f(x) is convex in domain X, if and only if it satisfies:
f( y + (1-) z) f(y) + (1-)f(z)
for any , 0 1, at all points y and z in X.
Find a local minimum point x* for f(x) for feasible region defined by
constraint functions: f(x*) f(x) for all x satisfying the constraints in
some neighborhood around x* (not for all x X)
Sufficient condition for a local solution to the NLP to be a global is
that f(x) is convex for x X.
Finding and verifying global solutions will not be considered here.
Requires a more expensive search (e.g. spatial branch and bound).
11
6
Linear Algebra - Background
Gradient Vector - (f(x))
# "f / "x1 &
% "f / "x (
2
!f = % (
......
% (
$ "f / "x n '
& 2 )
% " xn "x 1 " xn " x 2 " xn (
2
f !2 f
Note: =
xi x j !x j !x i
13
7
Linear Algebra - Eigenvalues
Find v and where Avi = i vi, i = i,n
Note: Av - v = (A - I) v = 0 or det (A - I) = 0
For this relation is an eigenvalue and v is an eigenvector of A.
If A is symmetric, all i are real
i > 0, i = 1, n; A is positive definite
i < 0, i = 1, n; A is negative definite
i = 0, some i: A is singular
Quadratic Form can be expressed in Canonical Form (Eigenvalue/Eigenvector)
xTAx AV = V
V - eigenvector matrix (n x n)
- eigenvalue (diagonal) matrix = diag(i)
If A is symmetric, all i are real and V can be chosen orthonormal (V-1 = VT).
Thus, A = V V-1 = V VT
For Quadratic Function: Q(x) = aTx + xTAx
Define:
z = VTx and Q(Vz) = (aTV) z + zT (VTAV)z
= (aTV) z + zT z
Minimum occurs at (if i > 0)
x = -A-1a or
x = Vz = -V(-1VTa)
15
(1)-1/2
x1
(2)-1/2
z2
16
8
Zero Curvature
Singular Hessian
One eigenvalue is zero, the other is strictly positive or negative
A is positive semidefinite or negative semidefinite
There is a ridge of stationary points (minima or maxima)
17
Indefinite Curvature
Indefinite Hessian
One eigenvalue is positive, the other is negative
Stationary point is a saddle point
A is indefinite
Note: these can also be viewed as two dimensional projections for higher dimensional problems
18
9
Eigenvalue Example
T
"1% 1 "2 1%
Min Q(x) = $ ' x + x T $ 'x
#1& 2 #1 2&
"2 1%
AV = V( with A = $ '
#1 2&
"1 0% " 1/ 2 1/ 2 %
V T AV = ( = $ ' with V = $ '
#0 3& #-1/ 2 1/ 2 &
( x x ) / 2 (x + x ) / 2
z =VTx = 1 2 x = Vz = 1 2
( x1 + x2 ) / 2 ( x1 + x2 ) / 2
0 1 / 3
z* = x* =
2 /(3 2 ) 1 / 3
19
1. Convergence Theory
Global Convergence - will it converge to a local optimum (or stationary
point) from a poor starting point?
Local Convergence Rate - how fast will it converge close to this point?
2. Benchmarks on Large Class of Test Problems
Representative Problem (Hughes, 1981)
Min f(x1, x2) = exp(-)
u = x1 - 0.8
v = x2 - (a1 + a2 u2 (1- u)1/2 - a3 u)
= -b1 + b2 u2 (1+u)1/2 + b3 u
= c1 v2 (1 - c2 v)/(1+ c3 u2)
a = [ 0.3, 0.6, 0.2]
b = [5, 26, 3]
c = [40, 1, 10]
x* = [0.7395, 0.3144]
f(x*) = -5.0893
20
10
Three Dimensional Surface and Curvature for Representative Test Problem
21
x*
Unconstrained Local Minimum
Sufficient Conditions
f (x*) = 0
Contours of f(x)
pT2f (x*) p > 0 for pn
(positive definite)
x1
1
(
f ( x ) = f ( x*) + f ( x*)T ( x x*) + ( x x*)T 2 f ( x*)( x x*) + O x x *
2
3
)
Since f(x*) = 0, f(x) is purely quadratic for x close to x*
22
11
Newton's Method
23
24
12
Newton's Method - Convergence Path
Starting Points
[0.8, 0.2] needs steepest descent steps w/ line search up to 'O', takes 7 iterations to ||f(x*)|| 10-6
[0.35, 0.65] converges in four iterations with full steps to ||f(x*)|| 10-6
25
13
Quasi-Newton Methods
Motivation:
Need Bk to be positive definite.
Avoid calculation of 2f.
Avoid solution of linear system for d = - (Bk)-1 f(xk)
Strategy:
Define matrix updating formulas that give (Bk) symmetric, positive
definite and satisfy:
(Bk+1)(xk+1 - xk) = (f k+1 f k) (Secant relation)
T
k +1 -1
k +1
ssT Hk y y Hk
(B ) = H k
= H + T -
s y y Hk y
!
where:
s = xk+1- xk
y = f (xk+1) - f (xk)
! 27
Quasi-Newton Methods
BFGS Formula (Broyden, Fletcher, Goldfarb, Shanno, 1970-71)
k +1 yy T B k s sT B k
B = Bk + T
- k
s y sB s
T T
k +1 "1 (s - H y) s + s (s - H y)
k T k
( y - H k s) y s sT
(B ) =H k +1
= H +k
yT s
-
! T
y s y s( )( )
T
Notes:
! 1) Both formulas are derived under similar assumptions and have
symmetry
2)
Both have superlinear convergence and terminate in n steps on
quadratic functions. They are identical if is minimized.
3)
BFGS is more stable and performs better than DFP, in general.
4)
For n 100, these are the best methods for general purpose
problems if second derivatives are not available.
28
14
Quasi-Newton Method - BFGS
Convergence Path
Starting Point
[0.2, 0.8]
starting from B0 = I, converges in 9 iterations to ||f(x*)|| 10-6
29
Harwell (HSL)
IMSL
NAg - Unconstrained Optimization Codes
Netlib (www.netlib.org)
MINPACK
TOMS Algorithms, etc.
These sources contain various methods
Quasi-Newton
Gauss-Newton
Sparse Newton
Conjugate Gradient
30
15
Constrained Optimization
(Nonlinear Programming)
y
What is the smallest box for three round objects?
Variables: A, B, (x1, y1), (x2, y2), (x3, y3)
Fixed Parameters: R1, R2, R3
Objective: Minimize Perimeter = 2(A+B)
Constraints: Circles remain in box, can't overlap 2
Decisions: Sides of box, centers of circles. A 3
B x
$ x1, y1 " R1 #( x - x ) 2 + ( y - y ) 2 " ( R + R ) 2
x1 # B - R1, y1 # A - R1 %% 1 2 1 2 1 2
& 2 2 2
% x 2, y 2 " R 2 x 2 # B - R 2, y 2 # A - R 2 $( x1 - x 3) + ( y1 - y 3) " ( R1 + R 3)
&
' x 3, y 3 " R 3 x 3 # B - R 3, y 3 # A - R 3 % 2 2 2
%&( x 2 - x 3) + ( y 2 - y 3) " ( R 2 + R 3)
in box
x1, x2, x3, y1, y2, y3, A, B 0 no overlaps
!
!
32
16
Characterization of Constrained Optima
Mi n
Mi n
Min
Min
Min
Mi n Mi n
Mi n
17
Optimal solution for inequality constrained problem
Min
f(x)
s.t
. g(x) 0
Analogy: Ball rolling down valley pinned by fence
Note: Balance of forces (f, g1)
35
18
Optimality conditions for local optimum
Necessary First Order Karush Kuhn - Tucker Conditions
19
Single Variable Example
of KKT Conditions - Revisited
Min -(x)2
s.t. -a x a, a > 0 -a x a
x* = a is seen by inspection
Lagrange function :
L(x, u) = -x2 + u1(x-a) + u2(-a-x)
First Order KKT conditions:
L(x, u) = -2x + u1 - u2 = 0
u1 (x-a) = 0
u2 (-a-x) = 0
-a x a
u1, u2 0
f(x)
40
20
Role of KKT Multipliers
-a x a a + a
1 0 3( x1 ) 2 u1
0 + 0
1 1 u2
- x2 0, u1 0, u1 x2 = 0
x2 (x1 )3 0, u2 0, u2 ( x2 (x1 )3 ) = 0
21
Special Cases of Nonlinear Programming
Linear Programming:
Min
cTx
x2
s.t.
Ax b
Cx = d, x 0
Functions are all convex global min.
Because of Linearity, can prove solution will
always lie at vertex of feasible region.
x1
Simplex Method
-
Start at vertex
-
Move to adjacent vertex that offers most improvement
-
Continue until no further improvement
Notes:
1)
LP has wide uses in planning, blending and scheduling
2)
Canned programs widely available.
43
Simplex Method
Min
-2x1 - 3x2
Min
-2x1 - 3x2
s.t.
2x1 + x2 5
s.t. 2x1 + x2 + x3 = 5
x1, x2 0
x1, x2, x3 0
(add slack variable)
Now, define f = -2x1 - 3x2
f + 2x1 + 3x2 = 0
Set x1, x2 = 0, x3 = 5 and form tableau
x1
x2
x3
f
b
x1, x2 nonbasic
2
1
1
0
5
x3 basic
2
3
0
1
0
Underlined terms are -(reduced gradients); nonbasic variables (x1, x3), basic variable x2
44
22
Quadratic Programming
Problem:
Min
aTx + 1/2 xT B x
Axb
Cx=d
1)
Can be solved using LP-like techniques:
(Wolfe, 1959)
Min
j (zj+ + zj-)
s.t.
a + Bx + ATu + CTv = z+ - z-
Ax - b + s = 0
Cx - d = 0
u, s, z+, z- 0
{uj sj = 0}
with complicating conditions.
Definitions:
xi - fraction or amount invested in security i
ri (t) - (1 + rate of return) for investment i in year t.
i - average r(t) over T years, i.e.
T
1
i =
T
! r (t)
t=1
i
Max x i i
i
s.t. x i =1
i
xi 0, etc.
Note: maximize average return, no accounting for risk.
46
23
Portfolio Planning Problem
Definition of Risk - fluctuation of ri(t) over investment (or past) time period.
To minimize risk, minimize variance about portfolio mean (risk averse).
Variance/Covariance Matrix, S
T
1
{ S} ij = 2ij =
T
(r (t) - )(r (t) - )
i i j j
t =1
Min x T Sx
s.t. x i =1
i
i xi R
i
xi 0, etc.
Example: 3 investments
j
3 1 - 0.5
1.
IBM
1.3
S = 1 2 0.4
2.
GM
1.2 -0.5 0.4 1
3.
Gold
1.08
47
24
Portfolio Planning Problem - GAMS
S O L VE S U M M A R Y
**** MODEL STATUS
1 OPTIMAL
**** OBJECTIVE VALUE
1.2750
RESOURCE USAGE, LIMIT
1.270
1000.000
ITERATION COUNT, LIMIT
1
1000
BDM - LP
VERSION 1.01
A. Brooke, A. Drud, and A. Meeraus,
Analytic Support Unit,
Development Research Department,
World Bank,
Washington D.C. 20433, U.S.A.
Estimate work space needed
- -
33 Kb
Work space allocated
- -
231 Kb
EXIT - - OPTIMAL SOLUTION FOUND.
LOWER
LEVEL
UPPER
MARGINAL
- - - - EQU LP
.
.
.
1.000
- - - - EQU E2
1.000
1.000
1.000
1.200
49
25
Algorithms for Constrained Problems
Classification of Methods:
26
Reduced Gradient Method with Restoration
(GRG2/CONOPT)
Min f(x)
Min
f(z)
s.t.
g(x) + s = 0 (add slack variable)
`
s.t. c(z) = 0
h(x) = 0
a z b
a x b, s 0
Partition variables into:
zB - dependent or basic variables
zN - nonbasic variables, fixed at a bound
zS - independent or superbasic variables
Modified KKT Conditions
"f (z) + "c(z) # $ % L + %U = 0
c(z) = 0
z(i) = zU(i) or z(i) = z(i)
L , i& N
%U(i) , % (i)
L = 0, i ' N
53
54
27
Definition of Reduced Gradient
df "f dz "f
= + B
dzS "zS dzS "zB
Because c(z) = 0,we have :
T T
# "c & # "c &
dc = % ( dzS + % ( dzB = 0
$"zS ' $ "zB '
dzB # "c &# "c &)1 )1
= )% (% ( = )* zS c [* zB c ]
dzS $ "zS '$"zB '
This leads to :
df )1
= * S f (z) ) * S c [* B c ] * B f (z) = * S f (z) + * S c(z) +
dzS
By remaining feasible always, c(z) = 0, a z b, one can apply an
unconstrained algorithm (quasi-Newton) using (df/dzS), using (b)
!
Solve problem in reduced space of zS variables, using (e).
55
Let z S = x1, z B = x2
df f f
=
dz S z S
[
zS c zB c ]1
z B
df 1
= 2 x1 3[4] (- 2 ) = 2 x1 + 3 / 2
dx1
If cT is (m x n); zScT is m x (n-m); zBcT is (m x m)
(df/dzS) is the change in f along constraint direction per unit change in zS
56
28
Gradient Projection Method
(superbasic nonbasic variable partition)
Piecewise linear path z() starting at the reference point z and obtained by
projecting steepest descent (or any search) direction at z onto the box region
given by:
57
29
Reduced Gradient Method with Restoration
zB
zS
59
zB
zS
60
30
Reduced Gradient Method with Restoration
zB
zS
61
62
31
Reduced Gradient Method without Restoration
zB
zS
63
64
32
MINOS/Augmented Notes
65
33
SQP Chronology
1.
Wilson (1963)
-
active set can be determined by solving QP:
Min
f(xk)Td + 1/2 dT xx L(xk, uk, vk) d
d
s.t.
g(xk) + g(xk)T d 0
h(xk) + h(xk)T d = 0
2.
Han (1976), (1977), Powell (1977), (1978)
-
approximate xxL using a positive definite quasi-Newton update (BFGS)
-
use a line search to converge from poor starting points.
Notes:
-
Similar methods were derived using penalty (not Lagrange) functions.
-
Method converges quickly; very few function evaluations.
-
Not well suited to large problems (full space update used).
For n > 100, say, use reduced space methods (e.g. MINOS).
67
68
34
Elements of SQP Search Directions
How do we obtain search directions?
Form QP and let QP determine constraint activity
At each iteration, k, solve:
Min
f(xk) Td + 1/2 dT Bkd
d
s.t.
g(xk) + g(xk) T d 0
h(xk) + h(xk) T d = 0
Convergence from poor starting points
As with Newton's method, choose (stepsize) to ensure progress
toward optimum: xk+1 = xk + d.
is chosen by making sure a merit function is decreased at each
iteration.
Exact Penalty Function
(x) = f(x) + [ max (0, gj(x)) + |hj (x)|]
> maxj {| uj |, | vj |}
Augmented Lagrange Function
(x) = f(x) + uTg(x) + vTh(x)
+ /2 { (hj (x))2 + max (0, gj (x))2}
69
70
35
Basic SQP Algorithm
71
Min
x2
s.t. 1 + x1 - (x2)2 0
x1
1 - x1 - (x2)2 0
x2 -1/2
72
36
SQP Test Problem
1.2
1.0
0.8
x2
0.6
x*
0.4
0.2
0.0
0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2
Min
x2
s.t.
-x2 + 2 x12 - x13 0
-x2 + 2 (1-x1)2 - (1-x1)3 0
x* = [0.5, 0.375].
73
1.0
0.8
x2
0.6
0.4
0.2
0.0
0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2
37
SQP Test Problem Second Iteration
1.2
1.0
0.8
x2
0.6
0.4 x*
0.2
0.0
0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2
38
Barrier Methods for Large-Scale
Nonlinear Programming
minn f ( x)
x#"
Original Formulation s.t c( x) = 0 Can generalize for
x!0 axb
n
Barrier Approach minn % ( x) = f ( x) ! " ln xi
x$#
i =1
s.t c( x) = 0
As 0, x*() x* Fiacco and McCormick (1968)
77
39
Global Convergence of Newton-based
Barrier Solvers
Merit Function
Exact Penalty: P(x, ) = f(x) + ||c(x)||
Aug d Lagrangian: L*(x, , ) = f(x) + Tc(x) + ||c(x)||2
Assess Search Direction (e.g., from IPOPT)
Min f ( x )
1
s.t. x1 x3 = 0
x2 2
2
( x1 ) x2 1 = 0
x2 , x3 0
Newton-type line search stalls
even though descent directions
exist
x1 k T k
A( x ) d x + c( x ) = 0
xk + d x > 0
Remedies:
Composite Step Trust Region
(Byrd et al.)
Filter Line Search Methods 80
40
Line Search Filter Method
(x) = ||c(x)||
81
Implementation Details
2
Min || c( x) ||1 + || x " xk ||Q
xl ! xk ! xu
Apply Exact Penalty Formulation
Exploit same structure/algorithm to reduce infeasibility
82
41
IPOPT Algorithm Features
Freely Available
Hessian Calculation CPL License and COIN-OR
- BFGS (full/LM and reduced space) distribution: http://www.coin-
or.org
- SR1 (full/LM and reduced space)
- Exact full Hessian (direct) IPOPT 3.1 recently rewritten
in C++
- Exact reduced Hessian (direct)
- Preconditioned CG Solved on thousands of test
problems and applications
83
80
60
40
20
0
1248163264
S 84
42
Recommendations for Constrained Optimization
1.
Best current algorithms
GRG 2/CONOPT
MINOS
SQP
IPOPT
2.
GRG 2 (or CONOPT) is generally slower, but is robust. Use with highly
nonlinear functions. Solver in Excel!
3.
For small problems (n 100) with nonlinear constraints, use SQP.
4. For large problems (n 100) with mostly linear constraints, use MINOS.
==> Difficulty with many nonlinearities
Small, Nonlinear Problems - SQP solves QP's, not LCNLP's, fewer function calls.
Large, Mostly Linear Problems - MINOS performs sparse constraint decomposition.
Works efficiently in reduced space if function calls are cheap!
Exploit Both Features IPOPT takes advantages of few function evaluations and large-
scale linear algebra, but requires exact second derivatives
85
GAMS Programs
CONOPT - Generalized Reduced Gradient method with restoration
MINOS - Generalized Reduced Gradient method without restoration
NPSOL Stanford Systems Optimization Lab
SNOPT Stanford Systems Optimization Lab (rSQP discussed later)
IPOPT barrier NLP, COIN-OR, open source
KNITRO barrier NLP
MS Excel
Solver uses Generalized Reduced Gradient method with restoration
86
43
Rules for Formulating Nonlinear Programs
1)
Avoid overflows and undefined terms, (do not divide, take logs, etc.)
e.g.
x + y - ln z = 0
x + y - u = 0
exp u - z = 0
2)
If constraints must always be enforced, make sure they are linear or bounds.
e.g.
v(xy - z2)1/2 = 3
vu = 3
u2 - (xy - z2) = 0, u 0
3)
Exploit linear constraints as much as possible, e.g. mass balance
xi L + yi V = F zi li + vi = fi
L li = 0
4)
Use bounds and constraints to enforce characteristic solutions.
e.g.
a x b, g (x) 0
to isolate correct root of h (x) = 0.
5) Exploit global properties when possibility exists. Convex (linear equations?)
Linear Program? Quadratic Program? Geometric Program?
6)
Exploit problem structure when possible.
e.g.
Min
[Tx - 3Ty]
s.t.
xT + y - T2 y = 5
4x - 5Ty + Tx = 7
0 T 1
(If T is fixed solve LP) put T in outer optimization loop.
87
Process Optimization
Problem Definition and Formulation
Decisions
Additional Variables
88
44
Hierarchy of Nonlinear Programming
Formulations and Model Intrusion
OPEN
SAND Full Space Formulation
Multi-level Parallelism
Black Box
CLOSED
2000
1988-98:
1981-87:
- : Simultaneous
Static
Flowsheet
Real-time
optimization
dynamic
optimization
optimization
over
over
over
1 000
100
100
000
000
variables
variables
variables
andand
and
constraints
constraints
constraints
45
Flowsheet Optimization Problems - Introduction
Design Specifications
Specify # trays reflux ratio, but would like to specify
C
overhead comp. ==> Control loop -Solve Iteratively
1
3
46
Chronology in Process Optimization
h (y )= 0
4 6 1
w(y ) y
f(x, y(x))
x 94
47
Expanded Region with Feasible Path
95
48
SQP - Infeasible Path Approach
solve and optimize simultaneously in x and y
extended Newton method
97
Examples
1.
Single Unit and Acyclic Optimization
-
Distillation columns & sequences
2.
"Conventional" Process Optimization
- Monochlorobenzene process
- NH3 synthesis
3.
Complicated Recycles & Control Loops
-
Cavett problem
-
Variations of above
98
49
Optimization of Monochlorobenzene Process
A-1
ABSORBER P
PHYSICAL PROPERTY OPTIONS
15 Tray s S11
(3 Theo ret ical Stages)
Cavett Vapor Pressure
32 psia
Redlich-Kwong Vapor Fugacity
S05 25
Benzene,
0.1 Lb Mo le/Hr
Corrected Liquid Fugacity
S04 ps ia of MC B
Fe ed
Ideal Solution Activity Coefficient
o
80 F
T
S07 S10 D-1
o DISTILLATION
OPT (SCOPT) OPTIMIZER
37 psia 270 F
30 Tray s
F-1 HC1
Optimal Solution Found After 4 Iterations
FLASH
(20 Theoreti cal Stages)
S01 S02
Kuhn-Tucker Error
0.29616E-05
Steam
S09
Steam
Allowable Kuhn-Tucker Error 0.19826E-04
o
360 F S03 S08 360o F
Objective Function -0.98259
H-1 12,000
2
U = 100 T-1 Bt u/ hr- ft
S12 o
TREATER 90 F
Fe ed F low Rates Maximize H-2
Optimization Variables
LB Moles/Hr Profit
U = 100
HC1 10
32.006 0.38578 200.00
120.00
Benzene 40 S13 Water
C S15 80o F S14
Tear Variables
MCB 50
P-1 1200 F
0.10601E-19 13.064 79.229 120.00 50.000
MCB
T
Tear Variable Errors (Calculated Minus Assumed)
-0.10601E-19 0.72209E-06
-0.36563E-04 0.00000E+00
0.00000E+00
-Results of infeasible path optimization
-Simultaneous optimization and convergence of tear streams.
99
Prod uc t
50
Ammonia Process Optimization
Optimization Problem
Performance Characterstics
Max
{Total Profit @ 15% over five years} 5 SQP iterations.
2.2 base point simulations.
s.t.
105 tons NH3/yr. objective function improves by
Pressure Balance
No Liquid in Compressors $20.66 x 106 to $24.93 x 106.
1.8 H2/N2 3.5
difficult to converge flowsheet
Treact 1000o F at starting point
NH3 purged 4.5 lb mol/hr
NH3 Product Purity 99.9 % Item
Optimum Starting point
Tear Equations
Objective Function($106) 24.9286 20.659
Performance of Algorithms
Constrained NLP algorithms are gradient based
(SQP, Conopt, GRG2, MINOS, etc.)
Global and Superlinear convergence theory assumes accurate gradients
Worst Case Example (Carter, 1991)
Newton s Method generates an ascent direction and fails for any !
Min f ( x) = xT Ax dactual
+ 1 / 1 / ( A) = (1 / ) 2 -g0
A= f 0
al
1 / + 1 /
dide
x0 = [1 1]T f ( x0 ) = x0
g ( x0 ) = f ( x0 ) + O( )
d = A1 g ( x0 ) 102
51
Implementation of Analytic Derivatives
parameters, p exit variables, s
x Module Equations y
c(v, x, s, p, y) = 0
dy/dx
Sensitivity ds/dx
dy/dp
ds/dp
Equations
S3
S1 S2 Fl as h
M ix er
P
S6 Hi P
S4
S5 Flas h
S7
Ra tio Lo P
2 2 3 1 /2 Flas h
M ax S3(A) *S3 (B) - S3(A) -S3 (C) + S3 (D) -(S 3(E))
200
8000
G RG
SQP
G RG
6000 r SQP
SQP
r SQP
CPU Seconds (VS 3200)
CPU Seconds (VS 3200)
100 4000
2000
0 0
1Nu merical Exact 2 Nu merical Exact
1 2 104
52
Large-Scale SQP
Min f(z) Min f(zk)T d + 1/2 d T Wk d
s.t. c(z)=0 s.t. c(zk) + (k)T d = 0
zL z zU zL zk + d zU
Characteristics
Many equations and variables ( 100 000)
Many bounds and inequalities ( 100 000)
+ easy to implement with existing sparse solvers, QP methods and line search
techniques
+ exploits 'natural assignment' of dependent and decision variables (some
decomposition steps are 'free')
+ does not require second derivatives
- reduced space matrices are dense
- may be dependent on variable partitioning
- can be very expensive for many degrees of freedom
- can be expensive if many QP bounds
106
53
Reduced space SQP (rSQP)
Range and Null Space Decomposition
108
54
Reduced space SQP (rSQP) Interpretation
dZ
dY
R R
A = Q = [Y Z ]
0
0
2. Partition variables into decisions u and dependents v. Create
orthogonal Y and Z with embedded identity matrices (ATZ = 0, YTZ=0).
[
AT = u cT ]
v cT = [N C ]
I N T C T
Z = 1 Y =
C N I
55
rSQP Algorithm
112
56
rSQP Results: Computational Results
for Process Problems
Vasantharajan et al (1990)
113
18
10
QVK
QVK
114
57
RTO - Basic Concepts
w
Plant
APC
y
u
RTO DR-PE
c(x, u, p) = 0 c(x, u, p) = 0
p
On line optimization Data Reconciliation & Parameter
Steady state model for states (x) Identification
Supply setpoints (u) to APC Estimation problem formulations
(control system) Steady state model
Model mismatch, measured and Maximum likelihood objective
unmeasured disturbances (w) functions considered to get
parameters (p)
Minu F(x, u, w)
s.t. c(x, u, p, w) = 0 Minp (x, y, p, w)
x X, u U s.t. c(x, u, p, w) = 0
x X, p P
9
RTO Characteristics
w
Plant
APC
y
u
RTO DR-PE
c(x, u, p) = 0 c(x, u, p) = 0
p
58
RTO Consistency
(Marlin and coworkers)
11
RTO Stability
(Marlin and coworkers)
12
59
RTO Robustness
(Marlin and coworkers)
Eliminate ping-ponging
13
C3
LIGHT
PREFLASH
NAPHT HA
M IX ED LP G
SPLITTER
C3/C4
i C4
MAINFRAC.
DEBUTANIZER
DIB
R EFOR M ER
NAPHT HA
R EC YC LE n C4
OIL
60
Optimization Case Study Characteristics
Model consists of 2836 equality constraints and only ten independent variables. It
is also reasonably sparse and contains 24123 nonzero Jacobian elements.
NP
P = " z iC i G + " z iC iE + " z i Ci Pm # U
i !G i !E m=1
Cases Considered:
1. Normal Base Case Operation
2. Simulate fouling by reducing the heat exchange coefficients for the debutanizer
3. Simulate fouling by reducing the heat exchange coefficients for splitter
feed/bottoms exchangers
4. Increase price for propane
5. Increase base price for gasoline together with an increase in the octane credit
121
122
61
Nonlinear Optimization Engines
Evolution of NLP Solvers:
process optimization for design, control and operations
80s:
90s: Flowsheet
00s: Static Real-time
Simultaneous
optimization
dynamic
optimization
optimization
(RTO)
over
over 100
100
1 000variables
000
000
variables
variables
and constraints
and
and
constraints
constraints
123
W k + Ak d ( x k )
kT = k
A 0 + c( x )
work in full space of all variables
second derivatives useful for objective and constraints
use specialized large-scale Newton solver
124
62
Gasoline Blending
s
line
Pipe OIL TANKS
Supply tanks
125
q t ,i ..qt , j ..qt ,k
f t ,ij f t , jk .. f t ,k max ( ck ft , k ci f t ,i )
qi vk t k i
s.t. ft , jk ft , ij + vt + 1, j =v
t, j
k i
vt , j ft , k f
t , jk
=0
j
qt , j ft , jk qt , i ft , ij + qt + 1, j vt + 1, j =q v
t, j t, j
k i
q f q f =0
t, k t, k t , j t , jk
Supply tanks (i) Intermediate tanks (j) Final Product tanks (k) j
qk qt , k q k
max
min
vj vt , j v j
min max
f, v ------ flowrates and tank volumes
q ------ tank qualities
63
Small Multi-day Blending Models
Single Qualities
Haverly, C. 1978 (HM) Audet & Hansen 1998 (AHM)
F1 B1
F1 B1
P1
P1
F2
F2 B2
P2
F3 B2
F3 B3
127
128
64
Summary of Results Dolan-Mor plot
0.9
0.8
0.7
0.6
0.5
IPOPT
LOQO
0.4 KNITRO
SNOPT
0.3 MINOS
LANCELOT
0.2
0.1
0
1 10 100 1000 10000 100000 1000000 10000000
129
1000
LANCELOT
800
MINOS
Iterations
600 SNOPT
400 KNITRO
LOQO
200
IPOPT
0
0 200 400 600
Degrees of Freedom
100
CPU Time (s, norm.)
10 LANCELOT
M INOS
SNOPT
1
KNITRO
0 200 400 600
LOQO
0.1
IPOPT
0.01
Degrees of Freedom
130
65
Comparison of NLP solvers
(latest Mittelmann study)
Mittelmann NLP benchmark (10-26-2008)
0.9
Limits Fail
0.8 IPOPT 7 2
KNITRO 7 0
fraction solved within
0.7
IPOPT
0.6
KNITRO
LOQO 23 4
0.5 LOQO SNOPT 56 11
SNOPT
0.4
CONOPT CONOPT 55 11
0.3
0.2
0.1
0
0 2 4 6 8 10 12
log(2)*minimum CPU time
Interesting hybrids -
FSQP/cFSQP - SQP and constraint elimination
LANCELOT (Augmented Lagrangian w/ Gradient Projection)
132
66
Sensitivity Analysis for Nonlinear Programming
At nominal conditions, p0
Min f(x, p0)
s.t. c(x, p0) = 0
a(p0) x b(p0)
How is the optimum affected at other conditions, p p0?
133
x2
z1
Saddle
Point
x*
x1
z2
134
67
IPOPT Factorization Byproducts:
Tools for Postoptimality and Uniqueness
Modify KKT (full space) matrix if nonsingular
Wk + k + 1I Ak
AkT 2 I
1 - Correct inertia to guarantee descent direction
2 - Deal with rank deficient Ak
135
NLP Sensitivity
Parametric Programming
Solution Triplet
Optimality Conditions
136
68
NLP Sensitivity Properties (Fiacco, 1983)
137
NLP Sensitivity
Obtaining
Optimality Conditions of
138
69
Sensitivity for Flash Recycle Optimization
(2 decisions, 7 tear variables)
S3
S1 S2 Fl as h
M ix er
P
S6
S4
S5
S7
Ra tio
2 2 3 1 /2
M ax S3(A) *S3 (B) - S3(A) -S3 (C) + S3 (D) -(S 3(E))
19 QP 1
Hi P
QP 2
Flas h
18.5
Ac tua l
Lo P
Flas h 18
17.5
17
0.01
0.1
0.001
70
Multi-Scenario Optimization
Coordination
1. Design plant to deal with different operating scenarios (over time or with
uncertainty)
2. Can solve overall problem simultaneously
large and expensive
polynomial increase with number of cases
must be made efficient through specialized decomposition
141
71
Design Under Uncertain Model Parameters
and Variable Inputs
min E [ P(d , z, y, ),
s.t. h(d , z, y, ) = 0,
g (d , z, y, ) 0]
Min E [ P(d , z, y, ),
d,z
s.t. h(d , z, y, ) = 0,
g (d , z, y, ) 0]
72
Multi-scenario Models for Uncertainty
z, d
i i i i
Model Model Model Model
yi yi yi yi y()
Min f 0 (d ) + j f j (d , z , y j , j )
j
s.t. h j (d , z , y j , j ) = 0
g j (d , z , y j , j ) 0
Composite NLP
Min i (fi(i, xi) + f0(i)/N)
s.t. hi(xi, i) = 0, i = 1, N
gi(xi, i) +si = 0, i = 1, N
0 si, d i=0, i = 1, N
r(d) 0
73
Solving Multi-scenario Problems:
Interior Point Method
Min f 0 (d ) + j f j (d , z j , y j , j )
j
Min f 0 ( p) + j f j ( p, x j )
j
s.t. h j (d , z j , y j , j ) = 0
s.t. c j ( p, x j ) = 0, p, x j 0
g j (d , z j , y j , j ) + s j = 0, s j 0
Min f 0 ( p) + j f j ( p, x j ) ln x lj + ln p l
j j ,l j ,l
s.t. c j ( p, x j ) = 0
i 0 [ x( i ), p( i )] [ x*, p*]
( x , x Lk + ( X ik ) 1Vi k ) xi ci ( xik , p k ) x p
Ki = i i k k T ui = i up =
xi ci ( xi , p ) 0 i
Lk + ( P k ) 1V pk pc x p L xi c
K p = p, p wi = i T
pc T 0
p ci
74
Schur Complement Decomposition Algorithm
1 1
1. K pp wiT Ki wi u p = rp
w T
i K i ri
Key Steps i i
2. Ki ui = ri wi u p
2.
Internal Decomposition
Implementation
NLP Linear Algebra
NLP Algorithm
Interface Interface
Multi-scenario Block-Bordered
Default
NL P Linear
Linear Algebra
Solver
1 2 3 4 5
Composite NLPs
Water Network Base Problem
36,000 variables
600 common variables
Testing
Vary # of scenarios
Vary # of common variables
75
Parallel Schur-Complement
Scalability
Multi-scenario Optimization
Single Optimization over many
scenarios, performed on parallel
cluster
151
Parallel Schur-Complement
Scalability
Multi-scenario Optimization
Single Optimization over many
scenarios, performed on parallel
cluster
152
76
Summary and Conclusions
Optimization Algorithms
-Unconstrained Newton and Quasi Newton Methods
-KKT Conditions and Specialized Methods
-Reduced Gradient Methods (GRG2, MINOS)
-Successive Quadratic Programming (SQP)
-Reduced Hessian SQP
-Interior Point NLP (IPOPT)
Further Applications
-Sensitivity Analysis for NLP Solutions
-Multi-Scenario Optimization Problems
153
77