Other Nonlinear Regression Methods For Algebraic Models

Other Nonlinear Regression Methods for Algebraic Models
There is a variety of general purpose unconstrained optimization methods that can be used to estimate unknown parameters. These methods are broadly classified into two categories: direct search methods and gradient methods (Edgar and Himmelblau, 1988; Gill et al. 1981; Kowalik and Osborne, 1968; Sargent, 1980; Reklaitis, 1983; Scales, 1985). A brief overview of this relatively vast subject is presented and several of these methods are briefly discussed in the following sections. Over the years many comparisons of the performance of many methods have been carried out and reported in the literature. For example, Box (1966) evaluated eight unconstrained optimization methods using a set of problems with up to twenty variables.
5.1
GRADIENT MINIMIZATION METHODS
The gradient search methods require derivatives of the objective functions whereas the direct methods are derivative-free. The derivatives may be available analytically or otherwise they are approximated in some way. It is assumed that the objective function has continuous second derivatives, whether or not these are
explicitly available. Gradient methods are still efficient if there are some discontinuities in the derivatives. On the other hand, direct search techniques, which use function values, are more efficient for highly discontinuous functions. The basic problem is to search for the parameter vector k that minimizes S(k) by following an iterative scheme, i.e.,
67
Copyright 2001 by Taylor & Francis Group, LLC
68
Chapter 5
Minimize
(5.1)
where k=[k b k 2 ,...,k p ] T the p-dimensional vector of parameters, e=[eb e 2 ,...,e m ] T
the m-dimensional vector of residuals where e s = [yj -f(x ; ,k)] and Q, is a user specified positive definite weighting matrix.
The need to utilize an iterative scheme stems from the fact that it is usually
impossible to find the exact solutions of the equation that gives the stationary points of S(k) (Peressini et al. 1988),
(5.2a)
8k
o o where the operator V = ,
[9k, ok 2
is applied to the scalar function
S(k) yielding the column vector SS(k)
9k] 9S(k) ok 2
9S(k) 5k,
(5.2b)
Vector VS(k) contains the first partial derivatives of the objective function
S(k) with respect to k and it is often called the gradient vector. For simplicity, we denoted it as g(k) in this chapter. In order to find the solution, one starts from an initial guess for the parameters, k (0) =[k{ 0) ,k 2 0) ,...,k[, 0) ] T . There is no general rule to follow in order to obtain an initial guess. However, some heuristic rules can be used and they are discussed in Chapter 8. At the start of the jth iteration we denote by k^ the current estimate of the parameters. The jth iteration consists of the computation of a search vector Ak^ H ) from which we obtain the new estimate k^1' according to the following equation
kO+D=kli)+uG)Akti+i)
(5.3)
69
where u^ is the step-size, also known as dumping or relaxation factor. It is obtained by univariate minimization or prior knowledge based upon the theory of the
method (Edgar and Himmelblau, 1988). As seen from Equation 5.3 our main concern is to determine the search vector Ak^1'. Based on the chosen method to calculate the search vector, different solution methods to the minimization problem
arise. The iterative procedure stops when the convergence criterion for termination is satisfied. When the unknown parameters are of the same order of magnitude then a typical test for convergence is HAk^1'!! < TOL where TOL is a userspecified tolerance. A more general convergence criterion is
K
Ak!fj+U
k!-
-I
-NSIG < 10
(5.4)
where p is the number of parameters and NSIG is the number of desired significant digits in the parameter values. It is assumed that no parameter converges to zero. The minimization method must be computationally efficient and robust (Edgar and Himmelblau, 1988). Robustness refers to the ability to arrive at a solution. Computationally efficiency is important since iterative procedures are employed. The speed with which convergence to the optimal parameter values, k*, is reached is defined with the asymptotic rale of convergence (Scales, 1985). An algorithm is considered to have a 9th order rate of convergence when 0 is the largest integer for which the following limit exists.
0 < lim
k w -k'
(5.5)
In the above equation, the norm || || is usually the Euclidean norm. We have a linear convergence rate when 9 is equal to 1. Superlinear convergence rate refers to the case where 9=1 and the limit is equal to zero. When 9=2 the convergence rate is called quadratic. In general, the value of 9 depends on the algorithm while the value of the limit depends upon the function that is being minimized.
5.1.1 Steepest Descent Method
In this method the search vector is the negative of the gradient of the objective function and is given by the next equation
70
Chapter 5
(5.6a) Based on Equation 5.1 the search vector is related to the residuals = [ y ; - f ( x j , k ) ] as follows
Ak (J+D =
(5.6b)
where
o
9k7
5
Ve/ =
9e] 9k] 9e, 9k2 9e, okp
9e2 9k] 9e2 9k2 9e2 9kp
oe
9e
(5.7)
9e
9k
and
a
9f,
9k7 9
Vf, 1 =
5f7 5k,
9k,
5fm 5k
(5.8)
Sf,
9k, 5k? Sf, 5kr
9k
5f, 9k n
5k
As seen from the above equations, the (mxp) matrix (Ve r ) T is the Jacobean matrix, J, of the vector function e, and the (mxp) matrix(VfT)T is the Jacobean matrix, G, of the vector function f(x,k). The srth element of the Jacobean matrix J, is given by r=\,2,...,p (5.9a)
sr =
6k,.
71
Similarly, the srth element of the Jacobean matrix G, is given by
G sr = 1 - - 1 ; s=l,2,...,m, r=l,2,...,p.
(5.9b)
The rate of convergence of the Steepest Descent method is first order. The basic difficulty with steepest descent is that the method is too sensitive to the scaling of S(k), so that convergence is very slow and oscillations in the k-space can easily occur. In general a well scaled problem is one in which similar changes
in the variables lead to similar changes in the objective function (Kowalik and
Osborne, 1968). For these reasons, steepest descent/ascent is not a viable method
for the general purpose minimization of nonlinear functions. It is of interest only
for historical and theoretical reasons.
Algorithm - Implementation Steps 1.

2.
Input the initial guess for the parameters, k(0> and NSIG or TOE
Specify weighting matrix Qj for i=l,2,...N.
3.
4.
For j=0,l, 2,..., repeat

Compute Ak1^1' using Equation 5.6
5. 6.
Determine u using the bisection rule and obtain k
=k + u Ak
Continue until the maximum number of iterations is reached or converP gence is achieved (i.e., < 1(TNSIG or :TOL)
5.1.2
Newton's Method
By this method the step-size parameter u is taken equal to 1 and the search vector is obtained from
J +1) ) (5.10)
where V2S(k) is the Hessian matrix of S(k) evaluated at k . It is denoted as H(k) and thus, the Equation 5.10 can be written as
72
Chapter 5
(5.11)
The above formula is obtained by differentiating the quadratic approximation of S(k) with respect to each of the components of k and equating the resulting expression to zero (Edgar and Himmelblau, 1988; Gill et al. 1981; Scales, 1985). It should be noted that in practice there is no need to obtain the inverse of the Hessian matrix because it is better to solve the following linear system of equations (Peressini et al. 1988)
|V z S(k)| Ak a + 1 ) =-VS(k)
or equivalently
(5.12a)
=-g(k ci) )
(5.12b)
As seen by comparing Equations 5.6 and 5.12 the steepest-descent method
arises from Newton's method if we assume that the Hessian matrix of S(k) is approximated by the identity matrix. Newton's method is not a satisfactory general-purpose algorithm for function minimization, even when a stepping parameter u is introduced. Fortunately, it
can be modified to provide extremely reliable algorithms with the same asymptotic rate of convergence. There is an extensive literature on the Newton's method and the main points of interest are summarized below (Edgar and Himmelblau, 1988; Gill et al. 1981; Pertessini et al. 1988; Scales, 1985):
(i) It is the most rapidly convergent method when the Hessian matrix of S(k) is available. (ii) There is no guarantee that it will converge to a minimum from an arbitrary starting point. (iii) Problems arise when the Hessian matrix is indefinite or singular. (iv) The method requires analytical first and second order derivatives which may not be practical to obtain. In that case, finite difference techniques may be employed.
The ratio of the largest to the smallest eigenvalue of the Hessian matrix at the minimum is defined as the condition number. For most algorithms the larger the condition number, the larger the limit in Equation 5.5 and the more difficult it is for the minimization to converge (Scales, 1985). One approach to solve the linear Equations 5.12 is the method of Gill and Murray that uses the Cholesky factorization of H as in the following (Gill and Murray, 1974; Scales, 1985):
73
LDL'
(5.13)
In the above equation, D is a diagonal matrix, L is a lower triagonal matrix
with diagonal elements of unity. As shown previously the negative of the gradient of the objective function is related to the residuals by Equation 5.6. Therefore, the gradient of the objective function is given by
(5.14)
where e ; = [yj ~f(x { , k)] and VeJ = (j J ){ is the transpose of the Jacobean matrix of the vector function e . The srth element of this Jacobean was defined by Equation 5.9a. Equation 5.14 can now be written in an expanded form as follows 5e] 6k ] 5e, 5k2 <5e2 dk i 5e2 9k 2 3em ok j 6em 5k2
"Qll Q21
Ql2 Q22
Qlm" Q2m
~el
e
(5.15)
Qml
i
5e, okp
5e2 okp
5em okp
Qm2
Qmm
After completing the matrix multiplication operations, we obtain
1=1 r=l m m
gOO-
= VS(k) =
1=1 r=l
(5.16)
m m 3f> yy_S, -Qlr^r *"{-' O PLK
1=1 r=l
Thus, the sth element of the gradient vector g(k) of the objective function S(k) is given by the following equation
74
Chapter 5
We are now able to obtain the Hessian matrix of the objective function S(k) which is denoted by H and is given by the following equation
5k ,
5S(k) 5k
5S(k)
5S(k) 5k
5k
5 2 S(k) 5k2 5 2 S(k)
5 2 S(k) 5k,5k 2 5 2 S(k)
5 2 S(k) 5k,5k p 5 2 S(k) 5k 2 Sk p
5g, 5k7 5k 2
5g2 5k, 5k 2
5k
5k
(5.18)
5 2 S(k) 5k 5k,
5 2 S(k) 5k 5k 2
5 2 S(k)
5k p 5k p 5k r
where we use the notation g s = -
5k s
Thus, the û th element of the Hessian matrix is defined by

= 1,2,..., p and u = l,2,...,p (5.19)
and this element can be calculated by taking into account Equation 5.16 as follows

N in m
75
% ^zzzi^Qi'-î yT,yj~-Q,,.er i (5.20)

v v X"11 c'ei we, i
1
+2v' x~^ X" 1
<-> e i
where = 1,2,... ,p and u=1,2,... ,p. The Gauss-Newton method arises when the second order terms on the right hand side of Equation 5.20 are ignored. As seen, the Hessian matrix used in Equation 5.11 contains only first derivatives of the model equations f(x,k). Leaving out the second derivative containing terms may be justified by the fact that these terms contain the residuals er as factors. These residuals are expected to be small quantities. The Gauss-Newton method is directly related to Newton's method. The main difference between the two is that Newton's method requires the computation of second order derivatives as they arise from the direct differentiation of the objective function with respect to k. These second order terms are avoided when the Gauss-Newton method is used since the model equations are first linearized and then substituted into the objective function. The latter constitutes a key advantage of the Gauss-Newton method compared to Newton's method, which also exhibits quadratic convergence.
Algorithm - Implementation Steps
\. Input the initial guess for the parameters, k(0) and NSIG or TOL.
2. Specify weighting matrix Qj for 1=1,2....N.
3. For j=0,l, 2,..., repeat.

4. Compute Ak^ +1) by solving Equation 5.12b.
5. Determine u using the bisection rule and obtain k
= k + uAk
6. Continue until the maximum number of iterations is reached or converP i î < 1(TNSIG or Ak a : TOL). gence is achieved (i.e., >
According to Scales (1985) the best way to solve Equation 5.12b is by performing a Cholesky factorization of the Hessian matrix. One may also perform a
Gauss-Jordan elimination method (Press et al., 1992). An excellent user-oriented presentation of solution methods is provided by Lawson and Hanson (1974). We prefer to perform an eigenvalue decomposition as discussed in Chapter 8.
76
Chapter 5
5.1.3
Modified Newton's Method
Modified Newton methods attempt to alleviate the deficiencies of Newton's method (Bard, 1970). The basic problem arises if the Hesssian matrix, G, is not positive definite. That can be checked by examining if all the eigenvalues of G are positive numbers. If any of the eigenvalues are not positive then a procedure proposed by Marquardt (1963) based on earlier work by Levenberg (1944) should be followed. A positive value y can be added to all the eigenvalues such that the resulting poitive quantities, K\+j , i=l,2,...,m are the eigenvalues of a positive matrix H, xi, given by
H LM = G + y I (5.21)
where I is the identity matrix.
Algorithm Implementation Steps

1.
2.
3.
Input the initial guess for the parameters, k<0>, y and NSIG or TOL.
Specify weighting matrix Qj for i=l,2,...N.
For j=0,1,2,..., repeat.
4.
Compute Ak1-''1' by solving Equation 5.12b but in this case the Hessian matrix H(k) has been replaced by that given by Equation 5.22. (j+i) (j) (j-t-i) Determine u using the bisection rule and obtain k = k + uAk Continue until the maximum number of iterations is reached or convergence is achieved (i.e.,
Akf i+1)
5. 6.
<10- N S ! G or
< TOL).
The Gill-Murray modified Newton's method uses a Cholesky factorization of the Hessian matrix (Gill and Murray, 1974). The method is described in detail by Scales (1985).
5.1.4 Conjugate Gradient Methods
Modified Newton methods require calculation of second derivatives. There might be cases where these derivatives are not available analytically. One may then calculate them by finite differences (Edgar and Himmelblau, 1988; Gill et al. 1981; Press et al. 1992). The latter, however, requires a considerable number of
77
gradient evaluations if the number of parameters, p, is large. In addition, finite difference approximations of derivatives are prone to truncation and round-off errors (Bard, 1974; Edgar and Himmelblau, 1988; Gill etal. 1981). Conjugate gradient-type methods form a class of minimization procedures that accomplish two objectives:
(a) (b)
There is no need for calculation of second order derivatives. They have relatively small computer storage requirements.
Thus, these methods are suitable for problems with a very large number of parameters. They are essential in circumstances when methods based on matrix factorization are not viable because the relevant matrix is too large or too dense (Gill etal. 1981). Two versions of the method have been formulated (Scales, 1986):
(a) Fletcher-Reeves version; (b) Polak-Ribiere version
Scales (1986) recommends the Polak Ribiere version because it has slightly better convergence properties. Scales also gives an algorithm which is used for both methods that differ only in the formula for the updating of the search vector. It is noted that the Rosenbrock function given by the next equation has been used to test the performance of various algorithms including modified Newton's and conjugate gradient methods (Scales, 1986)
= 100(x? - x 2 ) 2 + ( l - x , ) 2 (5.22)
5.1.5 Quasi-Newton or Variable Metric or Secant Methods
These methods utilize only values of the objective function, S(k), and values of the first derivatives of the objective function. Thus, they avoid calculation of the elements of the (pxp) Hessian matrix. The quasi-Newton methods rely on formulas that approximate the Hessian and its inverse. Two algorithms have been developed: (a) The Davidon-Fletcher-Powell Formula (DFP) (b) The Broyden-Fletcher-Goldfard-Shanno Formula (BFGS)
The DFP and BFGS methods exhibit superlinear convergence on suitably smooth functions. They are in general more rapidly convergent, robust and economical than conjugate gradient methods. However, they require much more storage and are not suitable for large problems i.e., problems with many parameters. Their storage requirements are equivalent to Newton's method. The BFGS method is considered to be superior to DFP in most cases because (a) it is less prone to loss of positive definiteness or to singularity problems
78
Chapter 5
through round off errors and (b) it has better theoretical convergence properties (Scales, 1985; Gill etal. 1981; Edgar and Himmelblau, 1988).
Algorithms are not given here because they are readily available elsewhere
(Gill and Murray, 1972, 1975; Goldfard, 1976; Scales, 1985; Edgar and Himmelblau, 1988; Gill etal. 1981).
5.2
DIRECT SEARCH OR DERIVATIVE FREE METHODS
Direct search methods use only function evaluations. They search for the minimum of an objective function without calculating derivatives analytically or numerically. Direct methods are based upon heuristic rules which make no a priori assumptions about the objective function. They tend to have much poorer convergence rates than gradient methods when applied to smooth functions. Several authors claim that direct search methods are not as efficient and robust as the indirect or gradient search methods (Bard, 1974; Edgar and Himmelblau, 1988; Scales, 1986). However, in many instances direct search methods have proved to be robust and reliable particularly for systems that exhibit local minima or have complex nonlinear constraints (Wang and Luus, 1978). The Simplex algorithm and that of Powell's are examples of derivative-free methods (Edgar and Himmelblau, 1988; Seber and Wild, 1989, Powell, 1965). In this chapter only two algorithms will be presented (1) the LJ optimization procedure and (2) the simplex method. The well known golden section and Fibonacci methods for minimizing a function along a line will not be presented. Kowalik and Osborne (1968) and Press et al. (1992) among others discuss these methods in detail. In an effort to address the problem of "combinatorial explosion" in optimization, several "new" global optimization methods have been introduced. These methods include: (i) neural networks (Bishop, 1995), (ii) genetic algorithms (Holland, 1975), (iii) simulated annealing techniques (Kirkpatrick et al., 1983, Cerny, 1985; Often and Ginneken, 1989), (iv) target analysis (Glover 1986) and (v) threshold accepting (Dueck and Scheuer, 1990) to name a few. These methods have attracted significant attention as being the most appropriate ones for large scale optimization problems whose objective functions exhibit a plethora of local optima. The simulated annealing technique is probably the most popular one. It tries to mimic the physical process of annealing whereby a material starts in a melted state and its temperature is gradually lowered until it reaches its minimum energy state. In the physical system the temperature should not be rapidly lowered because a sub-optimal structure may be formed in the crystallized system and lead to quenching. In an analogous fashion, we consider the minimization of the objective function in a series of steps. A slow reduction in temperature corresponds to allowing non-improving steps to be taken with a certain probability which is higher in the beginning of the algorithm when the temperature is high. Simulated anCopyright 2001 by Taylor & Francis Group, LLC
79
nealing is essentially a probabilistic hill climbing algorithm and hence, the method has the capability to move away from a local minimum. The probability used in simulated annealing algorithms is the Gibbs-Boltzmann distribution encountered in statistical mechanics. One of its characteristics is that for very high temperatures each state has almost an equal chance of being chosen to be the current state. For low temperatures only states with low energies have a high probability of becoming the current state. In practice, simulated annealing is implemented using the Metropolis et al. (1953) algorithm. Simulated annealing has solved the famous travelling salesman problem of finding the shortest itinerary for a salesman who visits N cities. The method has also been successfully used to determine the arrangement of several hundred thousand circuit elements on a small silicon wafer by minimizing the interference between their connecting wires. Usually the space over which the objective function is minimized is not defined as the p-dimensional space of p continuously variable parameters. Instead it
is a discrete configuration space of very high dimensionality. In general the number of elements in the configuration space is exceptionally large so that they cannot be fully explored with a reasonable computation time. For parameter estimation purposes, simulated annealing can be implemented by discretizing the parameter space. Alternatively, we can specify minimum and
maximum values for each unknown parameter, and by using a random number uniformly distributed in the range [0,1], we can specify randomly the potential
parameter values as
ki = kmin,i + K[kx.i - km.n,i] J i=l,...,P (5.23)
where R is a random number. Another interesting implementation of simulated annealing for continuous minimization (like a typical parameter estimation problem) utilizes a modification of the downhill simplex method. Press et al. (1992) provide a brief overview of simulated annealing techniques accompanied with listings of computer programs that cover all the above cases. A detailed presentation of the simulated annealing techniques can be found in The Annealing Algorithm by Otten and Ginneken (1989).
5.2.1 LJ Optimization Procedure
One of the most reliable direct search methods is the LJ optimization procedure (Luus and Jaakola, 1973). This procedure uses random search points and systematic contraction of the search region. The method is easy to program and handles the problem of multiple optima with high reliability (Wang and Luus, 1977, 1978). A important advantage of the method is its ability to handle multiple nonlinear constraints.
80
Chapter 5
The adaptation of the original LJ optimization procedure to parameter estimation problems for algebraic equation models is given next.
(i) Choose an initial guess for the p-dimensional unknown parameter vector,
k(0); the region contraction coefficient, 5 (typically 5=0.95 is used); the number of random evaluations of the objective function, NR (typically NR=100 is used) within an iteration; the maximum number of iterations, jmax (typically jmax=200 is used) and an initial search region, r(0) (a typical choice is r(0) = kmax - kmm).
(ii) Set the iteration index j=l and k(M) = k(0) and r frl) = r(0).
(iii) Generate or read from a file, N R xp random numbers (/?,-) uniformly distributed in [-0.5, 0.5]
(iv) For n=l,2,...,N R , generate the corresponding random trial parameter vectors from
kn = k-1) + J? B r (i - I)
where Rn = diag(R,,,, Rn2, ..., Rnp).
(v)
(5.24)
Find the parameter vector among the N R trial ones that minimizes the LS Objective function
(5.25)
(vi) Keep the best trial parameter vector, k*, up to now and the corresponding
minimum value of the objective function, S*.

(vii) Set k = k* and compute the search region for the next iteration as ra) = Sxr 0 ' 0 (viii) If j < jmax, increment j by 1 go to Step (iii); else STOP. (5.26)
Given the fact that in parameter estimation we normally have a relatively smooth LS objective function, we do not need to be exceptionally concerned about local optima (although this may not be the case for ill-conditioned estimation problems). This is particularly true if we have a good idea of the range where the parameter values should be. As a result, it may be more efficient to consider using a value for N R which is a function of the number of unknown parameters. For example, we may consider N R = 50+ lOxp
(5.27)
81
Typical values would be NR=60 when p=l, N R =110 when p=5 and N R =160 when p=10. At the same time we may wish to consider a slower contraction of the search
region as the dimensionality of the parameter space increases. For example we

could use a constant reduction of the volume (say 10%) of the search region rather
than a constant reduction in the search region of each parameter. Namely we could use, 8 = (0.90)i/p (5.28)
Typical values would be 5=0.90 when p=l, 5=0.949 when p=2, 5=0.974 when p=4, 5=0.987 when p=8 and 5=0.993 when p=16. Since we have a minimization problem, significant computational savings can be realized by noting in the implementation of the LJ optimization procedure that for each trial parameter vector, we do not need to complete the summation in Equation 5.23. Once the LS Objective function exceeds the smallest value found up to that point (S*), a new trial parameter vector can be selected. Finally, we may wish to consider a multi-pass approach whereby the search region for each unknown parameter is determined by the maximum change of the
parameter during the last pass (Luus, 1998).
5.2.2 Simplex Method
The "Sequential Simplex" or simply Simplex method relies on geometry to create a heuristic rule for finding the minimum of a function. It is noted that the Simplex method of linear programming is a different method. Kowalik and Osborn (1968) define simplex as following
A set ofN+1 points in the N-dimensional space forms a simplex. When the points are equidistant the simplex is said to be regular.
For a function of N variables one needs a (N+l)-dimensional geometric figure or simplex to use and select points on the vertices to evaluate the function to be minimized. Thus, for a function of two variables an equilateral triangle is used
whereas for a function of three variables a regular tetrahedron.
Edgar and Himmelblau (1988) demonstrate the use of the method for a function of two variables. Nelder and Mead (1965) presented the method for a function of N variables as a flow diagram. They demonstrated its use by applying
it to minimize Rosenbrock's function (Equation 5.22) as well as to the following functions:
82
Chapter 5
Powell 's quartic function

f(x) = (x, + 1 0 x 2 ) 2 + 5 ( x 3 - x 4 ) 2 + ( x 2 - 2 x 3 ) 4 + 1 0 ( x i - x 4 ) (5.29)
Fletcher and Powell's function

f(x) = 1 0 0 [ x 3 - 1 0 e ( X | , x 2 ) ] 2 + L f ( x 2 + x 2 ) - l
r i i2
+x5
(5.30)
where
[ arctan(\2 / / x ] ) x j >0 X2 )= \ \n + arctan(x2 /' x| ) ; x \ < 0 (5.31)
In general, for a function of N variables the Simplex method proceeds as follows:

Step 1. Form an initial simplex e.g. an equidistant triangle for a function of two variables.
Step 2. Step 3.
Evaluate the function at each of the vertices. Reject the vertex where the function has the largest value. This point is replaced by another one that is found in the direction away from the rejected vertex and through the centroid of the simplex. The distance from the rejected vertex is always constant at each search step. In the case of a function of two variables the direction is from the rejected vertex through the middle of the line of the triangle that is opposite to this point . The new point together with the previous two points define a new equilateral triangle. Proceed until a simplex that encloses the minimum is found. Stop when the difference between two consecutive function evaluations is less than a preset value (tolerance).
Step 4.
It is noted that Press et al. (1992) give a subroutine that implements the simplex method of Nelder and Mead. They also recommend to restart the minimization routine at a point where it claims to have found a minimum The Simplex optimization method can also be used in the search for optimal experimental conditions (Walters et al. 1991). A starting simplex is usually formed from existing experimental information. Subsequently, the response that plays the
83
role of the objective function is evaluated and a new vertex is found to replace the worst one. The experimenter then performs an experiment at this new vertex to
determine the response. A new vertex then is found as before. Thus, sequentially
one forms a new simplex and stops when the response remains practically the
same. At that point the experimenter may switch to a factorial experimental design to further optimize the experimental conditions. For example, Kurniawan (1998) investigated the in-situ electrochemical brightening of thermo-mechanical pulp using sodium carbonate as the brightening chemical. The electrochemical brightening process was optimized by performing a simplex optimization. In particular, she performed two sequential simplex optimizations. The objective of the first was to maximize the brightness gain and minimize the yellowness (or maximize the absolute yellowness gain) whereas that of
the second was to maximize the brightness gain only. Four factors were considered: current (Amp), anode area (cm2), temperature (K) and pH. Thus, the simplex
was a pentahedron. Kurniawan noticed that the first vertex was the same in both optimizations. This was due to the fact that in both cases the worse vertex was the same. Kurniawan also noticed that the search for the optimal conditions was more effective when two responses were optimized. Finally, she noticed that for the Simplex method to perform well, the initial vertices should define extreme ranges of the factors.
5.3
EXERCISES
You are asked to estimate the unknown parameters in the examples given in Chapter 4 by employing methods presented in this chapter.

Other Nonlinear Regression Methods For Algebraic Models

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Other Nonlinear Regression Methods For Algebraic Models

Diunggah oleh

Hak Cipta:

Format Tersedia

Other Nonlinear Regression Methods for Algebraic Models

GRADIENT MINIMIZATION METHODS

Copyright 2001 by Taylor & Francis Group, LLC

where k=[k b k 2 ,...,k p ] T the p-dimensional vector of parameters, e=[eb e 2 ,...,e m ] T

is applied to the scalar function

S(k) yielding the column vector SS(k)

Copyright 2001 by Taylor & Francis Group, LLC