Kirkup - Principles and Applications of Non-Linear Least Squares - An Introduction For Physical Scientists Using Excel Solver (2003)

Principles and Applications of
Non-Linear Least Squares: An

Introduction for Physical
Scientists using Excels Solver

Les Kirkup, Department of Applied Physics, Faculty of Science, University of
Technology, Sydney, New South Wales 2007, Australia.
email: Les.Kirkup@uts.edu.au

Version date: October 2003

2
Preamble
Least squares is an extremely powerful technique for fitting equations to data and is
carried out in laboratories every day. Routines for calculating parameter estimates
using linear least squares are most common, and many inexpensive pocket calculators
are able to do this. As we move away from fitting the familiar equation, y = a + bx to
data, we usually need to employ computer based programs such as spreadsheets, or
specialised statistical packages to do the number crunching. In situations where an
equation is complex, we may need to use non-linear least squares to fit the equation to
experimental or observational data.

Non-linear least squares is treated in this document with a focus on how Excels
Solver utility may be employed to perform this task. Though I had originally intended
to concentrate more or less exclusively on using Solver to carry out non-linear least
squares (due to the general availability of Excel and the fact that Id already written a
text discussing data analysis using Excel!), several other related topics emerged
including model identification, Monte Carlo simulations and uncertainty propagation.
I have included something about those topics in this document. In addition, I have
tried to include helpful worked examples to illustrate the techniques discussed.

I hope the document serves its purpose (I had senior undergraduates and graduates in
the physical sciences in mind when I wrote it) and I would appreciate any comments
as to what might have been included (or discarded).
3
CONTENTS
Section 1: Introduction 5
1.1 Reasons for fitting equations to data 7
Section 2: Linear Least squares 8
2.1 Standard errors in best estimates 11
Section 3: Extensions of the linear least squares technique 13
3.1 Using Excel to solve linear least squares problems 13
3.2 Limitations of linear least squares 14
Section 4: Excel's Solver add-in 17
4.1 Example of use of Solver 17
4.2 Limitations of Solver 24
4.3 Spreadsheet for the determination of standard errors in parameter estimates 26
4.4 Confidence intervals for parameter estimates 28
Section 5: More on fitting using non-linear least squares 30
5.1 Local Minima in SSR 30
5.2 Starting values 32
5.3 Starting values by curve stripping 33
5.4 Effect of instrument resolution and noise on best estimates 35
5.4.1 Adding normally distributed noise to data using Excels Random Number Generator 37
5.4.2 Fitting an equation to noisy data 38
5.4.3 Relationship between sampling density and parameter estimates 39
Section 6: Linear least squares meets non-linear least squares 42
Section 7: Weighted non-linear least squares 46
7.1 Weighted fitting using Solver 46
7.2 Example of weighted fitting using Solver 47
7.2.1 Best estimates of parameters using Solver 49
7.2.2 Determining the D matrix 50
7.2.3 The weight matrix, W 51
7.2.4 Calculation of ( )
1
T
WD D

52
7.2.5 Bringing it all together 52
Section 8: Uncertainty propagation, least squares estimates and calibration 54
8.1: Example of propagation of uncertainties involving parameter estimates 55
8.2 Uncertainties in derived quantities incorporating least squares estimates 59
8.3: Example of propagation of uncertainties in derived quantities 60
4
8.4: Uncertainty propagation and nonlinear least squares 61
8.4.1: Example of uncertainty propagation in parameter estimates obtained by nonlinear least
squares 62
Section 9: More on Solver 67
9.1 Solver Options 67
9.2 Solver Results 70
Section 10: Modelling and Model Identification 71
10.1 Physical Modelling 71
10.2 Data driven approach to discovering relationships 72
10.3 Other forms of modelling 73
10.4 Competing models 73
10.5 Statistical Measures of Goodness of Fit 74
10.5.1 Adjusted Coefficient of Multiple Determination 74
10.5.2 Akaikes Information Criterion (AIC) 75
10.5.3 Example 76
Section 11: Monte Carlo simulations and least squares 78
11.1 Using Excels Random Number Generator 79
11.2 Monte Carlo simulation and non-linear least squares 82
11.3 Adding heteroscedastic noise using Excels Random Number Generator 86
Section 12: Review 90
Acknowledgements 90
Problems 91
References 101
5

Section 1: Introduction
Scientists in all areas of the physical sciences search for defensible models that
describe the way nature works. As a part of that search they often investigate the
relationship between physical variables. As examples, they might want to know how
the,

electrical resistance of a superconductor depends on the temperature of the
superconductor.
width of an absorption peak in liquid chromatography depends on the flow of
the mobile phase through a packed column.
electrical permittivity of a solid depends on the moisture content in the solid.
output voltage from a conductivity sensor depends on the electrical
conductivity of the liquid in which the sensor is immersed.

A model that explains or describes the relationship between physical variables may be
devised from first principles, or it may represent a new development of an established
model. Whatever the situation, once a model has been devised, it is prudent to
compare it to real data obtained by experiment. One reason for doing this is to
establish whether predictions of the model are consistent with experimental data.

Consider a specific example in which nuclear radiation passes through material of
thickness, x. The relationship between the intensity, I, of the radiation and x can be
written,

I I x B
o
= + exp( ) (1.1)

I
o
is the intensity recorded in the absence of the material when the background
radiation is negligible, is the absorption coefficient of the material and B is the
background intensity.

The appropriateness (or otherwise) of equation 1.1 may be investigated for a
particular material by considering radiation intensity versus material thickness data, as
shown in figure 1.1.

6

0
200
400
600
800
1000
1200
0.0 0.5 1.0 1.5 2.0
Thickness (cm)
I
n
t
e
n
s
i
t
y

(
c
o
u
n
t
s
)

Figure 1.1: Intensity versus thickness data.

If equation 1.1 fairly describes the relationship between intensity and thickness, we
should be able to find values for I
o
, and B such that the line generated by
equation 1.1, when x varies between x = 0.0 and x = 2.0, fits the data shown in
figure 1.1 (i.e. passes close to the data points). We could begin by making an
intelligent guess at values for I
o
, and B. Figure 1.2 shows the outcome of one
attempt at guessing values for I
o
, and B.

0
200
400
600
800
1000
1200
0.0 0.5 1.0 1.5 2.0
Thickness (cm)
I
n
t
e
n
s
i
t
y

(
c
o
u
n
t
s
)
Line generated using equation 1.1,
where,
Io = 800
= 1
B = 10

Figure 1.2: Line drawn through intensity versus thickness data using equation 1.1.

It would have been fortuitous had the guesses for I
o
, and B, given in figure 1.2
produced a line that passed close to the data. We could try other values for I
o
, and B
and through a process of trial and error improve the fit of the line to the data.
However, it must be admitted that this is an inefficient way to fit any equation to data
7
and that guesswork must give way to a better approach. This is the main consideration
of this document.

1.1 Reasons for fitting equations to data
It is possible to fit almost any equation to any data. However, a compelling reason for
fitting an equation in the physical sciences is that it provides for an insightful
interpretation of physical or chemical processes or phenomena. In particular, the
fitting of an equation can assist in validating or refuting a theoretical model and allow
for the determination of physically meaningful parameters
1
.

As an example, the parameters in equation 1.1 have physical meaning. For example,
in equation 1.1 is a quantity that characterises radiation absorption by a material. The
applicability of equation 1.1 to a particular material is likely to have been studied by
other workers. Therefore a value for as determined through analysing the data in
figure 1.1 may be compared to that reported by others.

There are situations in which an equation is fitted to data for the purpose of calibration
and no attempt is made to relate parameters in the equation to physical constants. For
example, the concentration of a particular chemical species might be determined using
Atomic Absorption Spectroscopy (AAS). An instrument is calibrated by measuring
the absorption of known concentrations of the species. A graph of absorption, y,
versus concentration, x, is plotted. The next step is to fit an equation to the data. Using
the equation, it is possible to determine species concentration from measurements
made of absorption.

1
This issue is taken up again in section 10.
8
Section 2: Linear Least squares
Often in an experiment there is a known, expected or proposed relationship between
variables measured during the experiment. In perhaps the most common situation, the
relationship between the dependent (or response) variable, y, and the independent (or
predictor) variable, x, may be expressed as,

y = a + bx (2.1)

Equation 2.1 is the equation of a straight line with intercept, a, and slope, b.

In principle, we should be able to find the intercept and slope by drawing a straight
line through the points. In practice, the intercept and slope cannot be known exactly,
as this would require that we eliminate (or correct for) all sources of random and
systematic error in the data. This is not possible. If it were possible to eliminate all
sources of error, and assuming the relationship between x and y is linear, we could
write the exact relationship between x and y as,

y = + x (2.2)

where is the true intercept and is the true slope. and are often referred to as
parameters
2
and through applying techniques based on sound statistical principles, it
is possible to establish best estimates of those parameters. We will represent the best
estimates of and by symbols a and b respectively
3
.

A powerful and widely used technique for establishing best estimates of parameters
4

is that of least squares. The technique
5
is versatile and allows parameters to be
estimated when the relationship between x and y is more complex than that given by
equation 2.1. For example, a, b and c in the equations 2.3 to 2.5 may be determined
using the technique of least squares.

cx
x
b
a y + + = (2.3)
y = a + bx + cz. (here both x and z are independent variables) (2.4)

y = a+ ( ) [ ] cx b exp 1 . (2.5)

In this discussion of least squares, the following assumptions are made:

1) There are no errors in the x values.
2) Errors in the y values are normally distributed with a mean of zero and a
constant variance. Constant variance errors are sometimes referred to as
homoscedastic errors.
3) Errors in the y values are uncorrelated, so that, for example, the error in the ith
y value is not correlated to the error in the (i+1)th y value.

2
Sometimes referred to as population parameters or regression coefficients.
3
In some texts, best estimates of and are written as and
respectively.
4
Refer to chapters 6 and 7 of Kirkup (2002) for more details.
5
The technique is also widely referred to as regression.
9

The ith observed y value is written as y
i
and the ith value of x as x
i
. The ith predicted y
value found using the equation of the line is written as
i
y , such that
6
,

i i
bx a y + = (2.6)

The least squares technique of fitting equations to data requires the calculation of
( )
2
i i
y y . We sum ( )
2
i i
y y from i = 1 to i = n, where n is the number of data
points. The summation is written
7
,

( )
=
=
=
n i
i
i i
y y SSR
1
2
(2.7)

SSR is the Sum of Squares of the Residuals
8
. Strictly, equation 2.7 applies to fitting by
unweighted least squares. Weighted least squares is considered in section 7.

The next stage is to find values of a and b which minimise SSR in equation 2.7. This
is the key step in any least squares analysis, as values of a and b that minimise SSR
are regarded as the best estimates obtainable of the parameters in an equation
9
. Best
estimates could be found by trial and error, or by a systematic numerical search
using a computer. When a straight line is fitted to data, an equation for the best line
can be found analytically by partially differentiating SSR with respect to a and b in
turn then setting the resulting equations equal to zero. Simultaneous equations
obtained by this process are solved for a and b to give,

( )

2
2
2
=
i i
i i i i i
x x n
y x x y x
a (2.8)

and,

( )

2
2
=
i i
i i i i
x x n
y x y x n
b (2.9)

An elegant approach to determining a and b employs matrices. An added advantage of
the matrix approach is that it may be conveniently extended to situations in which
more complex equations are fitted to experimental data.

The equations to be solved for a and b can be expressed in matrix form as:

6
i
y is sometimes referred to as y hat.
7
In future we assume that all summations are carried out between i = 1 to i = n, and therefore we omit
the limits of the summations.
8
i i
y y is referred to as the ith residual.
9
The process by which estimates are varied until some condition (such as the minimisation of SSR) is
satisfied is often called optimisation.
10

i i
i
i i
i
y x
y
b
a
x x
x n
2
(2.10)
Equation 2.10 can be written concisely as,

AB = P (2.11)
where,

=

2
i i
i
x x
x n
A
=
b
a
B
i i
i
y x
y
P

To determine elements a and b of the matrix B, equation 2.11 is manipulated to give,

B = A
-1
P (2.12)

where A
-1
is the inverse matrix
10
of the matrix, A. Matrix inversion and matrix
multiplication are onerous to perform manually, especially if matrices are large. The
built in matrix functions in Excel are well suited to estimating parameters in linear
least squares problems.

Exercise 1
Table 2.1 contains x-y data which are shown plotted in figure 2.1.

Table 2.1: x-y data.
x y
2 70
4 63
6 49
8 42
10 31

0
10
20
30
40
50
60
70
80
0 2 4 6 8 10 12
x
y

Figure 2.1: Linearly related x-y data.

Using the data in table 2.1,

10
A
-1
is used in the calculation of the standard errors in parameter estimates and is sometimes referred
to as the error matrix.
11

i) find best estimates for the intercept, a, and the slope, b, of a straight line fitted
to the data using linear least squares [80.7, -4.94].
ii) draw the line of best fit through the points.
iii) calculate the sum of squares of residuals, SSR. [9.9].

2.1 Standard errors in best estimates
In addition to the best estimates, a and b, the standard errors in a and b are required as
this allows confidence intervals
11
to be quoted for the parameters and .

Calculations of a and b depend on the measured y values. As a consequence,
uncertainties in the y values contribute to the uncertainties in a and b. In order to
calculate uncertainties in a and b, the usual starting point is to determine the standard
errors in a and b, written as
a
and
b
respectively.
a
and
b
are given by
12
,

( )
2
1
2
1
2
=
i
a
x
(2.13)

2
1
2
1
=
n
b
(2.14)

where
( )
= n x x
i i
2
2

(2.15)
and
( )
2
1
2
2
1
i i
y y
n
(2.16)

Alternatively,
a
and
b
may be determined using matrices
13
. The covariance matrix,
V, contains elements which are the variances (as well as the covariances) of the best
estimates of a and b. V, may be written,
14

1
A V

=
2
(2.17)

A
-1
appears in equation 2.12.
2
can be found using equation 2.16.

Standard errors in a and b are written explicitly as,

( )
2
1
1
11
= A
a
(2.18)

( )
2
1
1
22
= A
b
(2.19)

11
See Kirkup (2002) p226.
12
See Bevington and Robinson (1992).
13
See chapter 5 of Neter et al. (1996).
14
The covariance matrix is considered in more detail in section 9.
12
1
11
A and
1
22
A are diagonal elements of the A

-1
matrix
15
.

Exercise 2
Using matrices, or otherwise, determine the standard errors in the intercept and slope
of the best straight line through the data given in table 2.1. [1.9, 0.29]

15
See Williams (1972).
13
Section 3: Extensions of the linear least squares technique
The technique of least squares used to fit equations to experimental data can be
extended in several ways:

Weighting the fit. The assumption that the standard deviation in y values is the
same for all values of x (a characteristic which is sometimes referred to as
homoscedasticity
16
) may not be valid. When it is not valid, we need to
weight the fit, in effect forcing the line closer to those points that are known
to higher precision. Weighted fitting is considered in section 7.
More complex equations may be fitted to the data. Equations such as
cx
x
b
a y + + = and y = a + bx + cx
2
are linear in the parameters and may be
fitted using linear least squares. The added computational complexity, which
can arise when there are more than two parameters to be estimated, favours
fitting by matrix methods. These methods are most conveniently applied using
a computer for matrix manipulation/inversion.
Equations may be fitted using linear least squares in which the equations have
more than one independent variable. As an example, the equation
y = a + bx + cz may be fitted to data, where x and z are the independent
variables (this is sometimes referred to as multiple regression).

3.1 Using Excel to solve linear least squares problems
Excel is capable of fitting functions to data that are linear in parameters. This may be
achieved by using one of the following features in Excel:

The LINEST() function
The Regression tool in the Analysis ToolPak

Excel has no built in tool for performing weighted least squares, though a spreadsheet
may be created to perform this procedure
17
.

Excel does not provide an easy to use utility for fitting an equation to data requiring
the application of non-linear least squares. However, with the aid of a powerful add-in
called Solver resident in Excel, fitting using non-linear least squares is possible. We
will deal with Solver in sections 4 and 9, but first we consider non-linear least
squares.

16
The condition where the variance in y values is not constant for all x, is referred to as
heteroscedasticity.
17
See Kirkup (2002), section 6.10.
14
3.2 Limitations of linear least squares
Quite complex functions can be fitted to data using linear least squares. As examples,

y = a + x bln + x c exp (3.1)
y = a + bx +
2
x
c
(3.2)

The equation to be fitted is inserted into equation 2.7. SSR is partially differentiated
with respect to each parameter estimate in turn. The resulting equations are set equal
to zero and solved to find best estimates of the parameters.

It is worth highlighting that the linear in linear least squares does not mean that a
plot of y versus x will produce a graph containing data which lie along a straight line.
Linear refers to the fact that the partial derivatives,
a
SSR
,
b
SSR
etc. as
described in section 2, are linear in the parameter estimates. Using this definition,
equations 3.1 and 3.2 may be fitted to data using linear least squares.

Some relationships between physical variables require transformation before they are
suitable for fitting by linear least squares. As an example, the variation of electrical
resistance, R, with temperature, T, of some semiconductor materials is known to
follow the relationship,

|
.
|
\
|
=
T
R R

exp
0
(3.3)

where R
0
and are constants.

Taking natural logarithms of both sides of equation 3.3 and comparing the resulting
equation with y = a + bx, we obtain,

(3.4)

Taking the y values to be ln R and the x values to be 1/T, least squares may be used to
find best estimates for ln R
0
(and hence R
0
) and . If the errors in R have constant
variance, then after transformation, the errors in ln(R) do not have constant variance.
In this circumstance weighted fitting is required
18
.
Weighted fitting of equations using least squares matters most when the scatter in data
is large. If data show small scatter, then the best estimates found using weighted least
squares are very similar to the best estimates found by using unweighted least squares.

18
See Dietrich (1991) p303.

1
ln ln
0
|
.
|
\
|
+ =
T
R R
x b a y + =
15
Though transforming equations can assist in many situations, there are some
equations that cannot be transformed into a form suitable for fitting by linear least
squares. As examples,
x c
bx
a y
+
+ =
2
(3.5)
cx b a y exp + = (3.6)
[ ] cx b a y exp 1 + = (3.7)
dx c bx a y exp exp + = (3.8)
For equations 3.5 to 3.8 it is not possible to obtain a set of linear equations that may
be solved for best estimates of the parameters. We must therefore resort to another
method of finding best estimates. That method still requires that parameter estimates
are found that minimise SSR.
SSR may be considered to be a continuous function of the parameter estimates. A
surface may be constructed, sometimes referred to as a hypersurface
19
in M
dimensional space, where M is the number of parameters appearing in the equation to
be fitted to data. The intention is to use non-linear least squares to discover estimates,
a, b, c etc which yield a minimum in the hypersurface. As with linear least squares,
these estimates are regarded as the best estimates of the parameters in the equation.
Figure 3.1 shows a hypersurface which depends on estimates a and b.

Figure 3.1: Variation of SSR as a function of parameter estimates, a and b. This
figure is adapted from rcs.chph.ras.ru/nlr.ppt by Alexey Pomerantsev.

19
See Bevington and Robinson (1992).
b

a

SSR
Minimum in hypersurface
16
Fitting by non-linear least squares begins with reasonable guesses for the best
estimates of the parameters. The objective is to modify the starting values in an
iterative fashion until a minimum is found in SSR. The computational complexity of
the iteration process means that non-linear least squares can only realistically be
carried out using a computer.

There are many documented ways in which the values of a, b, c etc. can be found
which minimise SSR, including Grid Search (Bevington and Robinson, 1992), Gauss
Newton method (Nielsen-Kudsk, 1983) and the Marquardt algorithm (Bates and
Watts, 1988).

Non-linear least squares is unnecessary when the derivatives of SSR with respect to
the parameters are linear in parameters. In this situation linear least squares offers a
more efficient route to determining best estimates of the parameters (and the standard
errors in the best estimates). Nevertheless, a linear equation can be fitted to data using
non-linear least squares. The answer obtained for best estimates of parameters and the
standard errors in the best estimates should agree, irrespective of whether a linear
equation is fitted using linear or non-linear least squares.
20

20
We consider this in more detail in section 6.
17
Section 4: Excel's Solver add-in
Solver, first introduced in 1991, is one of the many 'add ins' available in Excel
21
.
Originally designed for business users, Solver is a powerful and flexible optimisation
tool which is capable of finding (as an example) the best estimates of parameters
using least squares. It does this by iteratively altering the numerical value of variables
contained in the cells of a spreadsheet until SSR is minimised. To solve non-linear
problems, Solver uses Generalized Reduced Gradient (GRG2) code developed at the
University of Texas and Cleveland State University
22
. Features of Solver are best
described by reference to a particular example.

4.1 Example of use of Solver
Consider an experiment in which the rise of air temperature in an enclosure (such as a
room) is measured as a function of time as heat passes through a window into the
enclosure. Table 4.1 contains the raw data. Figure 4.1 displays the same data in
graphical form.

Table 4.1: Variation of air temperature in an enclosure with time.
Time (minutes) Temperature (C)
2 26.1
4 26.8
6 27.9
8 28.6
10 28.5
12 29.3
14 29.8
16 29.9
18 30.1
20 30.4
22 30.6
24 30.7

21
See Fyltra et al. (1998)
22
See Excel's online Help. See also Smith and Lasden (1992).
18
20.0
22.0
24.0
26.0
28.0
30.0
32.0
0 5 10 15 20 25
Time (minutes)
T
e
m
p
e
r
a
t
u
r
e

(
C
)

Figure 4.1: Temperature variation with time inside an enclosure.

Through a consideration of the flow of heat into and out of an enclosure, a
relationship may be derived for the air temperature, T, inside the enclosure as a
function of time, t. The relationship can be expressed,

( ) [ ] t k T T
s
exp 1 + = (4.1)

where T
s
, k and are constants. Equation 4.1 may be written in a form consistent with
other equations appearing in this document. Using x and y for independent and
dependent variables respectively and a, b and c for the parameter estimates,
equation 4.1 becomes
23
,

( ) [ ] cx b a y exp 1 + = (4.2)

To find best estimates, a, b and c, we proceed as follows:

1. Enter the raw data from table 4.1 into columns A and B of an Excel worksheet as
shown in sheet 4.1.
2. Type =$B$15+$B$16*(1-EXP($B$17*A2)) into cell C2 as shown in sheet 4.1.
Cells B15 to B17 contain the starting values for a, b and c respectively.
3. Use the cursor to highlight cells C2 to C13.
4. Click on the Edit menu. Click on the Fill option, then click on the Down option
24
.

23
Equation 4.2 is of the same form as that fitted to data obtained through fluorescent decay
measurements, where the decay is characterised by a single time constant see Walsh and Diamond
(1995).
24
These steps are often abbreviated in Excel texts to Edit Fill Down.
19
Sheet 4.1: Temperature (y) and time (x) data from table 4.1 entered into a
spreadsheet
25
.
A B C
1 x(mins) y(C) y (C)
2 2 26.1 =$B$15+$B$16*(1-EXP($B$17*A2))
3 4 26.8
4 6 27.9
5 8 28.6
6 10 28.5
7 12 29.3
8 14 29.8
9 16 29.9
10 18 30.1
11 20 30.4
12 22 30.6
13 24 30.7
14
15 a 1
16 b 1
17 c 1

Sheet 4.2 shows the values returned in the C column. As the squares of the residuals
are required, these are calculated in column D.

Sheet 4.2: Calculation of sum of squares of residuals.
A B C D
1 x(mins) y(C) y (C) (y- y )
2
(C
2
)
2 2 26.1 -5.38906 991.560654
3 4 26.8 -52.5982 6304.066229
4 6 27.9 -401.429 184323.2129
5 8 28.6 -2978.96 9045405.045
6 10 28.5 -22024.5 486333300.3
7 12 29.3 -162753 26498009287
8 14 29.8 -1202602 1.44632E+12
9 16 29.9 -8886109 7.89635E+13
10 18 30.1 -6.6E+07 4.31124E+15
11 20 30.4 -4.9E+08 2.35385E+17
12 22 30.6 -3.6E+09 1.28516E+19
13 24 30.7 -2.6E+10 7.01674E+20
14 SSR = 7.14765E+20
15 a 1
16 b 1
17 c 1

The sum of the squares of residuals, SSR, is calculated in cell D14 by summing the
contents of cells D2 through to D13. It is clear that the choices of starting values for a,
b and c are poor, as the predicted values, y , in column C of sheet 4.2 bear no
resemblance to the experimental values in column B. As a consequence, SSR is very
large. Choosing good starting values for parameter estimates is often crucial to the

25
The estimated values of the dependent variable based on an equation like equation 4.2 must be
distinguished from values obtained through experiment. Estimated values are represented by the
symbol, y and experimental values by the symbol, y.
20
success of fitting equations using non-linear least squares and we will return to this
issue later.

SSR in cell D14 is reduced by carefully altering the contents of cells B15 through to
B17. Solver is able to adjust the parameter estimates in cells B15 to B17 until the
number in cell D14 is minimised. To accomplish this, choose Tools on Excel's Menu
bar and pull down to Solver. If Solver does not appear, then on the same pull down
menu, select Add-Ins and tick the Solver Add-in box. After a short delay, Solver
should be added to the Tools pull down menu.

Click on Solver. The dialog box shown in figure 4.2 should appear.

Figure 4.2: Solver dialog box with cell references inserted.

After entering the information into the dialog box, click on the Solve button. After a
few seconds Solver returns with the dialog box shown in figure 4.3.

We want to minimise the value in cell D14 so
D14 becomes our 'target' cell.
Solver is capable of adjusting cell contents
such that the value in the target cell is
maximised, minimised or reaches a specified
value. For least squares analysis we require
the content of the target cell to be minimised.
Excel alters the values in these cells in order
to minimise the value in cell D14
It is possible to constrain the values in one or more
cells (for example a parameter estimate can be
prevented from assuming a negative value, if a
negative value is considered to be 'unphysical').
No constraints are applied in this example.
21

Figure 4.3: Solver dialog box indicating that fitting has been completed.

Inspection of cells B15 to B17 in the spreadsheet indicates that Solver has adjusted
the parameters. Sheet 4.3 shows the new parameters, SSR, etc.

Sheet 4.3: Best values for a, b and c returned by Solver when starting values are poor.
A B C D
1 x(mins) y(C) y (C) (y- y )
2
(C
2
)
2 2 26.1 24.75011 1.822210907
3 4 26.8 27.99648 1.431570819
4 6 27.9 29.10535 1.452872267
5 8 28.6 29.48411 0.781649268
6 10 28.5 29.61348 1.239842411
7 12 29.3 29.65767 0.127929367
8 14 29.8 29.67277 0.01618844
9 16 29.9 29.67792 0.049318684
10 18 30.1 29.67968 0.176666436
11 20 30.4 29.68028 0.517990468
12 22 30.6 29.68049 0.845498795
13 24 30.7 29.68056 1.039257719
14 SSR = 9.500995581
15 a 15.24587
16 b 14.43473
17 c -0.5371
18

SSR in cell D14 in sheet 4.3 is almost 20 orders of magnitude smaller than that in cell
D14 in sheet 4.2. However, all is not as satisfactory as it might seem. Consider the
best line through the points which utilises the parameter estimates in cells B15
through to B17 of sheet 4.3.

22

Figure 4.4: Graph of y versus x showing the line based on equation 4.2 where a, b
and c have the values given in sheet 4.3.

A plot of residuals (i.e. a plot of (
i i
y y ) versus x
i
) is often used as an indicator of the
goodness of fit of an equation to data, with trends in the residuals indicating a poor
fit
26
. However, no plot of residuals is required in this case to reach the conclusion that
the line on the graph in figure 4.4 is not a good fit to the experimental data. Solver has
found a minimum in SSR, but this is a local minimum
27
and the parameter estimates
are of little worth. The source of the problem can be traced to the poorly chosen
starting values (i.e. a = b = c = 1). Working from these initial estimates, Solver has
discovered a minimum in SSR. However, there is another combination of parameter
estimates that will produce an even lower value for SSR.

Methods by which good starting values for parameter estimates may be obtained are
considered in section 5.2. In the example under consideration here, we note (by
reference to equation 4.2) that when x = 0, y = a. Drawing a line 'by eye' through the
data in Figure 4.1 indicates that, when x = 0, y 25.5 C. Starting values for b and c
may also be established by a similar preliminary analysis of data which we will
consider in section 5.2. Denoting starting values by a
0
, b
0
and

c
0
, we find
28
,

a
0
= 25.5, b
0
= 5.5 and c
0
= -0.12

Inserting these values into sheet 4.2 and running Solver again gives the output shown
in sheet 4.4 and in graphical form in figure 4.5.

26
See Cleveland (1994) and Kirkup (2002) for a discussion of residuals.
27
Local minima are discussed in section 5.1.
28
All parameter estimates in this example have units (for example the unit of c is min
-1
, assuming time
is measured in minutes). For convenience units are omitted until the analysis is complete.
25
26
27
28
29
30
31
0 5 10 15 20 25
x
y
( ) [ ] x y 5371 . 0 exp 1 43 . 14 25 . 15 + =
23
Sheet 4.4: Best values for a, b and c returned by Solver when starting values for
parameter estimates are good.
A B C D
1 x(mins) y(C) y (C) (y- y )
2
(C
2
)
2 2 26.1 26.07247 0.000757691
3 4 26.8 26.97734 0.031447922
4 6 27.9 27.72762 0.029716516
5 8 28.6 28.34972 0.062639751
6 10 28.5 28.86555 0.133625786
7 12 29.3 29.29326 4.54949E-05
8 14 29.8 29.64789 0.023136202
9 16 29.9 29.94195 0.001759666
10 18 30.1 30.18577 0.007356121
11 20 30.4 30.38793 0.00014558
12 22 30.6 30.55556 0.001974583
13 24 30.7 30.69456 2.96361E-05
14 SSR = 0.29263495
15 a 24.98118
16 b 6.387988
17 c -0.093668
18

25
26
27
28
29
30
31
0 5 10 15 20 25
x
y

Figure 4.5: Graph of y versus x showing line and equation of line based on a, b and c
in sheet 4.4.

The sum of squares of residuals in cell D14 of sheet 4.4 is less than that in cell D14 of
sheet 4.3. This indicates that the parameter estimates obtained using Solver when
good starting values are used are rather better than those obtained when the starting
values are poorly chosen. In addition, the line fitted to the data in figure 4.5 (where
the line is based upon the new best estimates of the parameters) is far superior to the
line fitted to the same data in shown in figure 4.4. This is further reinforced by the
plot of residuals shown in figure 4.6 which exhibit a random scatter about the x axis.

( ) [ ] x y 09367 . 0 exp 1 388 . 6 98 . 24 + =
24

-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 5 10 15 20 25 30
x
y y

Figure 4.6: Plot of residuals based on the data and equation in figure 4.5.

4.2 Limitations of Solver
Solver is able to efficiently solve for the best estimates of parameters in an equation,
such as those appearing in equation 4.2. However, Solver does not provide standard
errors in the parameter estimates. Standard errors in estimates are extremely
important, as without them it is not possible to quote a confidence interval for the
estimates and so we cannot decide if the estimates are good enough for any
particular purpose.

If there are three parameters to be estimated, the standard errors in the parameter
estimates can be determined with the assistance of the matrix of partial derivatives
given by
29
,

E =
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|

2
2
2
c
y
c
y
b
y
c
y
a
y
c
y
b
y
b
y
b
y
a
y
c
y
a
y
b
y
a
y
a
y
i i i i i
i i i i i
i i i i i
(4.3)

The standard errors in a, b and c are the diagonal elements of the covariance matrix,
V, given by equation 2.17. Explicitly
30
,

( )
2
1
1
11
= E
a
(4.4)

( )
2
1
1
22
= E
b
(4.5)

29
Note that this approach can be extended to any number of parameters. See Neter et al. (1996)
chapter 13.
30
Compare these with equations 2.18 and 2.19.
25
( )
2
1
1
33
= E
c
(4.6)

where
31
,

( )
2
1
2
3
1
i i
y y
n
(4.7)

A convenient way to calculate the elements of the E matrix is to write,

E = D
T
D (4.8)

D
T
is the transpose of the matrix, D.

D is given by,

D =
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
n n n
i i i
2 2 2
1 1 1
(4.9)

The partial derivatives in equation 4.9 are evaluated on completion of fitting an
equation using Solver, i.e. at values of a, b and c that minimise SSR. It is possible in
some situations to determine the partial derivatives analytically. A more flexible
approach and one that is generally more convenient, is to use the method of finite
differences to find
a
y
1
,
a
y
2
etc. In general,
( ) [ ] [ ]
( ) a a
x c b a y x c b a y
a
y
i i
x c b
i
i
+
+
|
.
|
\
|
1
, , , , , , 1
, ,
(4.10)

As double precision arithmetic is used by Excel, the perturbation, , in equation 4.10
can be as small as = 10
-6
or 10
-7
.
Similarly, the partial derivatives,
b
y
i
and
c
y
i
, are approximated using,

( ) [ ] [ ]
( ) b b
x c b a y x c b a y
b
y
i i
x c a
i
i
+
+
|
.
|
\
|
1
, , , , , 1 ,
, ,
(4.11)

31
n -3 in the denominator of the term in the square brackets of equation 4.7 appears because the
estimate of the population standard deviation in the y values requires that the sum of squares of
residuals be divided by the number of degrees of freedom. The numbers of degree of freedom is the
number of data points, n, minus the number of parameters, p, in the equation. In this example, p = 3.
26
and,

( ) [ ] [ ]
( ) c c
x c b a y x c b a y
c
y
i i
x b a
i
i
+
+
|
.
|
\
|
1
, , , , 1 , ,
, ,
(4.12)

4.3 Spreadsheet for the determination of standard errors in parameter estimates
In an effort to clarify the process of estimating standard errors, we describe a step-by-
step approach using an Excel spreadsheet
32
.
To find good approximations of the derivatives
a
y
1
and
a
y
2
, it is necessary to perturb
a slightly (say to 1.000001a) while leaving the parameter estimates b and c at their
optimum values. Sheet 4.5 shows the optimum values, as obtained by Solver for a, b
and c in cells G20 to G22. Cell H20 contains the value 1.000001a. Cell I21 contains
the value 1.000001b and cell J22 contains the value 1.000001c.

Sheet 4.5: Modification of best estimates of parameters.
F G H I J
19 from solver b,c constant a,c constant a,b constant
20 a 24.98118 24.98120574 24.98118076 24.98118
21 b 6.387988 6.387988103 6.387994491 6.387988
22 c -0.093668 -0.093668158 -0.093668158 -0.09367
23

We use the modified parameter estimates to calculate the numerator in equation 4.10.
The denominator in equation 4.10 may be determined by entering the formula =
$H$20-$G$20 into a cell on the spreadsheet.

The partial derivative
1
, ,
1
x c b
a
y
|
.
|
\
|
is calculated by entering the formula

=(H2-C2)/($H$20-$G$20) into cell L2 of sheet 4.6
33
. By using FillDown, the
formula may be copied into cells in the L column so that the partial derivative is
calculated for every x
i
. To obtain
b
y
i
,
c
y
i
etc. this process is repeated for columns M

and N, respectively of sheet 4.6. The contents of cells L2 to N13 become the elements
of the D matrix given by equation 4.9.

32
It is possible to combine these steps into a macro of Visual Basic program (see Walkenback , 2001).
33
The values in the C column of the spreadsheet are shown in sheet 4.4.
27
Sheet 4.6: Calculation of partial derivatives.
H I J K L M N
1
y with b,c,
constant
y with, a,c
constant
y with, a,b,
constant) dy/da dy/db dy/dc
2 26.07249879 26.0724749 26.07247 1 0.17084 -10.5934
3 26.9773606 26.97733762 26.97734 1 0.31249 -17.5673
4 27.72764019 27.72761795 27.72762 1 0.42994 -21.8493
5 28.34974564 28.34972402 28.34972 1 0.52732 -24.1556
6 28.86557359 28.86555249 28.86555 1 0.60807 -25.0362
7 29.29327999 29.29325932 29.29326 1 0.67503 -24.911
8 29.64791909 29.64789877 29.6479 1 0.73055 -24.0978
9 29.94197336 29.94195334 29.94195 1 0.77658 -22.8355
10 30.18579282 30.18577304 30.18577 1 0.81475 -21.3012
11 30.38795933 30.38793976 30.38794 1 0.84639 -19.6247
12 30.5555887 30.55556929 30.55557 1 0.87264 -17.8993
13 30.69458107 30.69456181 30.69456 1 0.89439 -16.1907

Excels TRANSPOSE() function is used to transpose the D matrix. We proceed as
follows:

Highlight cells B24 to N26.
In cell B24 type =TRANSPOSE(L2:N13).
Press Ctrl Shift Enter to transpose of contents of cells L2 to N13 into
cells B24 to N26.
Multiply D
T
with D (using the MMULT matrix function in Excel) to give E,
i.e.,

E = D
T
D =
64 . 5252 8741 . 160 062 . 246

8741 . 160 49356 . 5 65898 . 7
062 . 246 65898 . 7 12
(4.13)

The MINVERSE() function in Excel is used to find the inverse of E, i.e.,

E
-1
=

0054669 . 0 03456 . 0 09005 . 0
03456 . 0 870672 . 1 48539 . 0
09005 . 0 48539 . 0 239517 . 2
(4.14)

Two more steps are required to calculate the standard errors in the parameter
estimates. The first is to calculate the square root of each diagonal element of the
matrix E
-1
. The second is to calculate using equation 4.7. Using the sum of squares
of residuals appearing in cell D14 of sheet 4.4, we obtain,

1803 . 0 2926 . 0
3 12
1
2
1
=

It follows that,

28
( )
2
1
1
11
= E
a
( )
2
1
240 . 2 1803 . 0 = = 0.270 (4.15)

( )
2
1
1
22
= E
b
( )
2
1
871 . 1 1803 . 0 = = 0.247 (4.16)

( )
2
1
1
33
= E
c
( )
2
1
005467 . 0 1803 . 0 = = 0.0133 (4.17)

4.4 Confidence intervals for parameter estimates
We use parameter estimates and their respective standard errors to quote a confidence
interval for each parameter
34
. For the parameters appearing in equation 4.1,

T
s
= a t
X%,
a
(4.18)
k = b t
X%,
b
(4.19)

= c t
X%,
c
(4.20)

t
X%,
is the critical value of the t distribution for X% confidence level with degrees
of freedom. t values are routinely tabulated in statistical texts. In this example = n -3
where n is the number of data points. In table 4.1 there are 12 points, so that = 9. If
we choose a confidence level of 95 %, (the commonly chosen level),

t
95%,9
= 2.262

Restoring the units of measurements and quoting 95% confidence intervals gives,

T
s
= (24.98 2.262 0.270) C = (24.98 0.61) C
k = (6.388 2.262 0.247) C = (6.39 0.56) C

= (-0.09367 2.262 0.0133) min
-1
= (-0.094 0.030) min
-1

Exercise 3
The amount of heat entering an enclosure through a window may be reduced by
applying a reflective coating to the window. An experiment is performed to establish
the effect of a reflective coating on the rise in air temperature within the enclosure.
The temperature within the enclosure as a function of time is shown in table 4.2.

Time (minutes) Temperature (C)
2 24.9
4 25.3
6 25.4
8 25.8
10 26.0
12 26.3
14 26.4
16 26.6
18 26.5
20 26.8
22 27.0
24 26.9

Table 4.2: Data for exercise 3.

34
See Kirkup (2002), p226.
29

Fit equation 4.2 to the data in table 4.2. Find a, b, and c and their respective standard
errors. Note that good starting values for parameter estimates are required if fitting by
non-linear least squares is to be successful.

[a = 24.503 C,
a
= 0.128 C, b = 3.0613 C,
b

= 0.227 C, c = -0.0682 min
-1
,
c
= 0.0147 min
-1
]
30
Section 5: More on fitting using non-linear least squares
There are several challenges to face when fitting equations to data using non-linear
least squares. These can be summarised as,

1) Choosing an appropriate model to describe the relationship between x and y.
2) Avoiding local minima in SSR.
3) Establishing good starting values prior to fitting by non-linear least squares.

We consider 2) and 3) in this section. Model identification is considered in section 10.

5.1 Local Minima in SSR
When data are noisy, or starting values are far from the best estimates, a non-linear
least squares fitting routine can become trapped in a local minimum.

To illustrate this situation, we draw on the analysis of data appearing in section 4.1.
Equation 4.2 is fitted to the data in table 4.1 using the starting values given in sheet
4.2 and the best estimates, a, b and c are obtained for the parameters. For clarity, the
relationship between only one parameter estimate (c) and SSR is considered. Solver
finds a minimum in SSR when c is about -0.53 and terminates the fitting procedure.
The variation of SSR with c is shown in figure 5.1.

The minimum in SSR in figure 5.1 is referred to as a local minimum as there is
another combination of parameter estimates that will give a lower value for SSR. The
lowest value of SSR obtainable corresponds to a global minimum. It is the global
minimum that we would like to identify in all least squares problems.

9.5
10.0
10.5
11.0
11.5
12.0
-0.8 -0.7 -0.6 -0.5 -0.4 -0.3
c
SSR

Figure 5.1: Variation of SSR with c when a local minimum has been found when
equation 4.2 is fitted to data.

When starting values are used that are closer to the final values
35
, the non-linear
fitting routine finds parameter estimates that produce a lower final value for SSR.
Figure 5.2 shows the variation of SSR with c in the interval (-0.18 < c < -0.04).

35
See section 4.1.
31

Figure 5.2: Variation of SSR with c when a global minimum has been found when
equation 4.2 is fitted to data.

A number of indicators can assist in identifying a local minimum, though there is no
fool-proof way of deciding whether a local or global minimum has been discovered.
A good starting point is to plot the raw data along with the fitted line (as illustrated in
figure 4.4). A poor fit of the line to the data could indicate,

A local minimum has been found.
An inappropriate model has been fitted to the data.

When a local minimum in SSR is found, the standard errors in the parameter estimates
tend to be large. As an example, best estimates appearing in sheet 4.3 (resulting from
being trapped in a local minimum), their respective standard errors and the magnitude
of the ratio of these quantities (expressed as a percentage) are,

a = 15.25,
a
= 9.27, so that |
a
/a| 100% = 61 %
b = 14.43,
b

= 9.17, so that |
b
/b| 100% = 64 %
c = -0.5371,
c
= 0.284, so that |
c
/c| 100% = 53 %

When the global minimum in SSR is found (see sheet 4.4), the best estimates of the
parameters, standard errors etc. are,

a = 24.98,
a
= 0.270, so that |
a
/a| 100% = 1.1 %
b = 6.388,
b

= 0.247, so that |
b
/b| 100% = 3.9 %
c = -0.09367,
c
= 0.0133, so that |
c
/c| 100% = 14 %

There is merit in fitting the same equation to data several times, each time using
different starting values for the parameter estimates. If, after fitting, there is
consistency between the final values obtained for the best estimates, then it is likely
that the global minimum has been identified.

5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
-0.20 -0.15 -0.10 -0.05 0
c
SSR
32
5.2 Starting values
There are no general rules that may be applied in order to determine good starting
values
36
for parameter estimates prior to fitting by non-linear least squares. It is
correct, but sometimes unhelpful, to remark that familiarity with the relationship
being studied can assist greatly in deciding what might be reasonable starting values
for parameter estimates.

A useful approach to determining starting values is to begin by plotting the
experimental data. Consider the data in figure 5.3, which has a smooth line drawn
through the points by eye.

20.0
22.0
24.0
26.0
28.0
30.0
32.0
0 5 10 15 20 25
y

(
C
)

31
25.5

Figure 5.3: Line drawn 'by eye' through the data given in table 4.1.

If the relationship between x and y is given by equation 5.1, then we are able to
estimate a and b by considering the data in figure 5.3 and a rough line drawn
through the data.

( ) [ ] cx b a y exp 1 + = (5.1)

Equation 5.1 predicts that y =a when x is equal to zero. From figure 5.3 we see that
when x = 0, y 25.5 C, so that a 25.5 C. When x is large, (and assuming c is
negative) then y = a + b. Inspection of the graph in figure 5.3 indicates that when x is
large, y 31.0 C, i.e. a + b 31.0 C. It follows that b 5.5 C. If we write the
starting values for a and b as a
0
and b
0
respectively, then a
0
= 25.5 C and b
0
= 5.5 C.

In order to determine a starting value for c, c
0
, equation 5.1 is rearranged into the
form
37
,

36
Sometimes referred to as initial estimates.
37
Starting values, a
0
and b
0
, are substituted into the equation.
x (minutes)
33
x c
b
a y
0
0
0
1 ln =
|
|
.
|
\
|
(5.2)

Equation 5.2 has the form of an equation of a straight line passing through the origin
(ie y = bx). It follows that plotting
|
|
.
|
\
|
0
0
1 ln
b
a y
versus x should give a straight line
with slope, c
0
.

y = -0.1243x + 0.2461
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0 5 10 15 20 25 30
x
l
n
(
[
1
-
(
y
-
a
0
)
/
b
0
]

Figure 5.4: Line of best fit used to determine starting value for c.

Figure 5.4 shows a plot of
|
|
.
|
\
|
0
0
1 ln
b
a y
versus x. The line of best fit and the
equation of the line has been added using the Trendline option in Excel
38,39
. The slope
of the line is approximately -0.12. The starting values may now be stated for this
example, i.e.,

a
0
= 25.5 C, b
0
= 5.5 C, c
0
= -0.12 min
-1

These starting values were used in the successful fit of equation 4.2 to the data given
in table 4.1 (The output of the fit is shown in sheet 4.4).

5.3 Starting values by curve stripping
Establishing starting values in some situations is quite difficult and may require a
significant amount of pre-processing of the data. For example, the fitting to data of an
equation consisting of a sum of exponential terms, such as,

dx c bx a y exp exp + = (5.3)

38
For details of Trendline see page 222 in Kirkup (2002).
39
An equation of the form y = a + bx was fitted to the data using Trendline in Excel. Alternatively, we
could have fitted y = bx to the data. Either approach would have given an acceptable starting value
for c
0
.
34
or
40

fx e dx c bx a y exp exp exp + + = (5.4)

is particularly challenging especially when data are noisy and/or the ratio of the
parameters within the exponentials is less than approximately 3 (e.g. when the ratio
b/d in equation 5.3 is less than 3)
41
. Fitting of equations such as equation 5.3 and
equation 5.4 is quite common, for example the kinetics of drug transport through the
human body is routinely modelled using compartmental analysis. Compartmental
analysis attempts to predict concentrations of drugs as a function of time (eg in blood
or urine), The relationship between concentration and time is often well represented
by a sum of exponential terms. In analytical chemistry, excited state lifetime
measurements offer a means of identifying components in a mixture. The decay of
phosphorescence with time that occurs after illumination of the mixture may be
captured. The decay can be represented by a sum of exponential terms. Fitting a sum
of exponentials by non-linear least squares allows for each component in the mixture
to be discriminated
42
.

If an equation to be fitted to data consists of a sum exponential terms, good starting
values for parameter estimates are extremely important if local minima in SSR are to
be avoided. It is also possible that, if starting values for the parameter estimates are
too far from the optimum values, SSR will increase during the iterative process to such
an extent that it exceeds the maximum floating point number that a spreadsheet (or
other program) can handle. In this situation, fitting is terminated and an error message
is returned by the spreadsheet.

Data in figure 5.5 have been gathered in an experiment in which the decay of photo-
generated current in the wide band gap semiconductor cadmium sulphide (CdS) is
measured as a function of time after photo-excitation of the semiconductor has ceased.
There appears to be an exponential decay of the photocurrent with time. Theory
indicates
43
that there may be more than one decay mechanism for photoconductivity.
That, in turn, suggests that an equation of the form given by equation 5.3 or equation
5.4 is appropriate.

40
Here we assume f d b > >
41
See Kirkup and Sutherland (1988).
42
See Demas (1983).
43
See Bube (1960), chapter 6.
35
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Time (ms)
P
h
o
t
o
c
u
r
r
e
n
t

(
a
r
b
i
t
r
a
r
y

u
n
i
t
s
)

Figure 5.5: Photocurrent versus time data for cadmium sulphide.

If equation 5.3 is to be fitted to data, how are starting values for parameter estimates
established? If b is large (and negative) then the contribution of the first term in
equation 5.3 to y is small when x exceeds some value, which we will designate as x.
Equation 5.3 can now be written, for x > x,

dx c y exp (5.5)

Equation 5.5 can be linearised by taking natural logarithms of both sides of the
equation. The next step is to fit a straight line to the transformed data to find
(approximate values) for c and d which we will designate as c
0
and d
0
respectively.
Now we revisit equation 5.3 and write for x < x,

bx a x d c y exp exp
0 0
= (5.6)

Transforming equation 5.6 by taking natural logarithms of both sides of the equation
then fitting a straight line to the transformed data will yield approximate values for a
and b which can serve as starting values in a non-linear fit. For a more detailed
discussion of how to determine starting values when an equation to be fitted consists
of a sum of exponential terms, see Kirkup and Sutherland, 1988.

5.4 Effect of instrument resolution and noise on best estimates
Errors in the dependent variable lead to uncertainties in parameter estimates
44
. If
errors are very large, it may not be possible to establish reasonable parameter
estimates. To illustrate the effect of errors on fitting, we consider the outcome of
Monte Carlo simulations in which errors are added to data in the form of normally
distributed noise to noise free data
45
. After noise is added, non-linear least squares is
performed to find best estimates of the parameters.

44
In the case of a model violation (such that the equation fitted to the data is not appropriate) there
would be non-zero residuals even if the data were error free. Such non-zero residuals would translate to
uncertainties in parameter estimates.
45
Monte Carlo simulations are dealt with in section 11.
36
To study the effect of noise on parameter estimates, data are generated in an
experiment in which the temperature of water in a vessel is monitored as it cools in a
laboratory. The equation relating temperature, T, to time, t, is written,

( ) ) exp( kt T T T T
s
+ =

(5.7)

where T
is the temperature at infinite time (which is equal to room temperature), T

s
is
the starting temperature, and k is the rate constant for cooling.

We choose (arbitrarily),
T
= 26 C, T
s
= 62 C, k = 0.034 min
-1

Noise free data generated at 5 minute intervals between t = 0 and t = 55 minutes using
equation 5.7 are shown in figure 5.6.

20
25
30
35
40
45
50
55
60
65
0 10 20 30 40 50 60
Time (minutes)
T
e
m
p
e
r
a
t
u
r
e

(
o
C
)

Figure 5.6: Noise free data of temperature versus time generated using equation 5.7.

Writing equation 5.7 using our usual convention for variables and parameter estimates
gives,
( ) ) exp( cx a b a y + = (5.8)

The next step is to fit equation 5.8 to the data, using the following conditions:

Starting values: a
0
= 25, b
0
= 60, c
0
= 0.02

The fitting options
46
are selected using Solver Options dialog box as shown in
figure 5.7:

46
Fitting options are discussed in section 9.1.
37

Figure 5.7: Solver Options used to fit equation 5.8 to the data in figure 5.6.

Using Solver, the following values were recovered for best estimates of parameters
and standard errors in best estimates.

a
a
b
b
c
c
SSR
25.99993547 3.66 10
-5
61.99997243 1.45 10
-5
0.0339998 7.6510
-8
2.73 10
-9
Table 5.1: Best estimate of parameters and standard errors in parameters.

5.4.1 Adding normally distributed noise to data using Excels Random Number
Generator
To investigate the effect of errors on the fitting of equations to data, normally
distributed noise
47
of constant standard deviation (i.e. homoscedastic data) is added to
noise free data
48
.

Normally distributed noise can be added to the data by using the Random Number
Generation tool in the Analysis ToolPak. The mean and standard deviation of the
random numbers are controlled using the dialog box shown in figure 5.8. When
adding noise, it is usual to select the mean to be zero. The standard deviation can have
any value (the larger the value, the greater the noise). For this example it is
convenient to leave the standard deviation at its default value of one.

47
Also referred to as Gaussian noise.
48
Heteroscedastic noise can also be added to data with the aid of Excels Random Number Generation
tool (see section 11.3).
38

Figure 5.8: Normally distributed noise with zero mean and standard deviation of one.
Noise is generated using the Random Number Generation tool in Excels Analysis
ToolPak.

The experimental data in column D (i.e. the data with noise added) are obtained by
adding values in columns B to those in column C. Figure 5.8 shows the formula
entered into cell D2. The next step is to use FillDown to enter the formula into cells
D3 to D13. A plot of y values with noise added (as given in column D of figure 5.8)
versus x is shown in figure 5.9.

20
25
30
35
40
45
50
55
60
65
0 10 20 30 40 50 60
Time (minutes)
T
e
m
p
e
r
a
t
u
r
e

(
o
C
)

Figure 5.9: Data in figure 5.6 with addition of normally distributed noise.

5.4.2 Fitting an equation to noisy data
To show the effect errors have on the fitting of equation 5.8 to data, parameter
estimates (and standard errors) are compared when temperature data,

are noise free
are rounded to the nearest 0.1 C, but no noise is added
have noise of standard deviation 0.2 C added
have noise of standard deviation 1 C added
have noise of standard deviation 5 C added.

A B C D
1 x(min) y(C) Noise (C) y
exp
(C)
2 0 62 -0.30023 =B2+C2
3 5 56.3719334 -1.27768
4 10 51.62373162 0.244257
5 15 47.61784084 1.276474
6 20 44.23821173 1.19835
7 25 41.38693755 1.733133
8 30 38.98141785 -2.18359
9 35 36.95196551 -0.23418
10 40 35.23978797 1.095023
11 45 33.79528402 -1.0867
12 50 32.57660687 -0.6902
13 55 31.54845183 -1.69043
39
Temperature data over the period x = 0 to x = 55 minutes were generated in the
manner described in section 5.4.1. The starting values for all fits were a
0
= 25,
b
0
= 60, c
0
= 0.02. Solver Options were as given in figure 5.7.

Noise a
a
b
b
c
c
SSR
None 25.9999 3.66 10
-5
61.99997 1.45 10
-5
0.0339998 7.6510
-8
2.73 10
-9
RD:0.1
49
25.9870 0.0712 61.9995 0.0282 0.0339850 0.000149 0.01031
0.2 26.4735 0.378 61.9991 0.154 0.0345083 0.000821 0.3070
1.0 19.2820 4.54 61.1136 0.959 0.0245648 0.00487 12.63
5.0 25.4289 6.42 70.1401 4.82 0.0484832 0.0197 278.2
Table 5.1: Best estimates of parameters and standard errors in estimates.

As anticipated, the standard errors in the estimates increase as the noise increases. In
order to indicate to what extent the estimates a, b and c differ from the true values,
T
= 26 C, T
s
= 62 C and k = 0.034 min
-1
respectively, percentage differences are
presented in table 5.2.

Noise a (a- T
)100%/ T
b (b- T
s
)100%/ T
s
c (b- k)100%/ k
None 25.9999 -0.000385 62.0000 4.84E-05 0.0339998 0.000588
RD:0.1 25.9870 0.0500 61.9995 0.000806 0.0339850 0.0441
0.2 26.4735 1.82 61.9991 0.00145 0.0345083 1.50
1.0 19.2820 25.8 61.1136 1.43 0.0245648 27.8
5.0 25.4289 2.20 70.1401 13.1 0.0484832 42.6
Table 5.2: (Absolute) percentage difference between parameter estimates and true
values.

Note that, on the whole, the percentage difference between the parameter estimates
and the true value as given in table 5.2, increases as the noise increases. However
examination of table 5.2 reveals that for a noise of standard deviation of 5, the
estimate of T

is within 2% of the true value. This should be expected: as noise
added is random there is a possibility that by chance a good estimate for some
parameter will be obtained even when the noise is quite large. However if we were to
repeat the simulation many times we would find that, on average, the percentage
difference between the true values of the parameters and the parameter estimates
would increase as the noise level increased.

5.4.3 Relationship between sampling density and parameter estimates
When repeat measurements are made of a single quantity (such as the time taken for a
ball to free fall through a fixed distance) the standard error in the mean,
x
, of the
data is related to the standard deviation, , by,
50

n
x
= (5.9)

49
Denotes temperature values rounded to 0.1 C.
50
See Kirkup (2002), ch1.
40
Equation 5.9 indicates that
x
reduces as n / 1 i.e. if more measurements are made,
we profit by a reduction in the standard error of the mean. It is anticipated that in
analysis by least squares, there is a similar reduction in the standard error of the
parameter estimates as the number of measurements increases
51
. To establish this,
consider the analysis of data generated using equation 5.7 (with parameters,
T
= 26 C, T
s
= 62 C, k = 0.034 min
-1
) to which noise of unity standard deviation
has been added to data gathered in the range x = 0 C to x = 60 C. Data are
generated at evenly spaced intervals of temperature. The number of values were
chosen to be n = 9, 16, 25, 33, 41, 49, 61, 91, 121. a, b and c and their respective
standard errors were determined using (unweighted) non-linear least squares.
Equation 5.8 was fitted to the data in order to establish best estimates and standard
errors in the best estimates. Squaring the standard errors gives the variances,
2
a
,
2
b
and
2
c
, in the parameter estimates.
If an equation of the form given in equation 5.9 is valid for the standard errors in the
parameter estimates, then plotting
2
a
,
2
b
and
2
c
versus 1/n should produce a
straight line. Figure 5.10 shows such plots.

outlier
y = 22.942x + 0.826
R
2
= 0.2831
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0.00 0.02 0.04 0.06 0.08 0.10 0.12
1/n
V
a
r
i
a
n
c
e

o
f

p
a
r
a
m
e
t
e
r

e
s
t
i
m
a
t
e
,

a

a)

y = 5.8982x + 0.1142
R
2
= 0.892
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.00 0.02 0.04 0.06 0.08 0.10 0.12
1/n
V
a
r
i
a
n
c
e

o
f

p
a
r
a
m
e
t
e
r

e
s
t
i
m
a
t
e
,

b

b)

51
Frenkel (2002) discusses the relationship between the standard errors in the parameter estimates and
the number of data, n.
41

y = 0.0001x + 2E-06
R
2
= 0.9164
0.0E+00
2.0E-06
4.0E-06
6.0E-06
8.0E-06
1.0E-05
1.2E-05
1.4E-05
1.6E-05
1.8E-05
0.00 0.02 0.04 0.06 0.08 0.10 0.12
1/n
v
a
r
i
a
n
c
e

o
f

p
a
r
a
m
e
t
e
r

e
s
t
i
m
a
t
e
,

c

c)

Figure 5.10: Variance of parameter estimates as a function of number of data. Each
graph shows the equation of the best straight line fitted to the points and the
coefficient of determination, R
2
.

With the exception of the circled data point in figure 5.10a, the points on the graphs in
figures 5.10a to 5.10c appear to be follow a linear relationship, indicating that the
variance of the parameter estimates does decrease (at least approximately) as 1/n.
42
Section 6: Linear least squares meets non-linear least squares
It is possible to use the technique of non-linear least squares to fit linear equations to
data. In such circumstances we expect the same values to emerge for the best
estimates of the parameters and the standard errors in the estimates, irrespective of
whether fitting is carried out by linear or non-linear least squares.

To illustrate this, we consider an example in which the van Deemter equation is fitted
to gas chromatography data in table 6.1
52
.

v (ml/min)
H (mm)
3.4 9.59
7.1 5.29
16.1 3.63
20.0 3.42
23.1 3.46
34.4 3.06
40.0 3.25
44.7 3.31
65.9 3.50
78.9 3.86
96.8 4.24
115.4 4.62
120.0 4.67
Table 6.1: Plate height versus flow rate data.

The relationship between plate height, H, and flow rate, v, can be written,
53

Cv
v
B
A H + + = (6.1)

where A, B and C are constants. Consistent with our convention of naming variables
and parameter estimates, we rewrite equation 6.1 as,

cx
x
b
a y + + = (6.2)

a, b, and c are estimates of the constants A, B and C respectively in equation 6.1.

Equation 6.2 may be fitted to the data in table 6.1 using linear least squares. A
convenient way to accomplish this is to use the Regression tool in the Analysis
ToolPak in Excel
54
. Figure 6.1 shows an Excel spreadsheet containing the data and the
output of the Regression tool. To perform (linear) least squares with this tool, we
place values of 1/x and x in adjacent columns (these appear in columns B and C of
figure 6.1).

52
See Moody H W (1982).
53
See Snyder et al. (1997), p46.
54
See Kirkup (2002) p 373.
43

Figure 6.1: Fitting equation 6.2 to data using the Regression tool in Excels Analysis
ToolPak.

Equation 6.2 is now fitted to the data in table 6.1 using Solver to perform non-linear
least squares. The approach adopted for determining the best estimates and the
standard errors in the best estimates is as described in sections 4, 4.1 and 4.2.

As anticipated, both linear least squares and non-linear least squares return the same
best estimates for the parameters and standard errors in the best estimates, as can be
seen by inspection of figures 6.1 and 6.2.

a
b
c
c
44

Best estimates of parameters Standard errors in estimates
Standard deviation
of y values
E =D
T
D
D matrix
D
T
matrix
Inverse of E matrix
Sum of squares of residuals
Figure 6.2: Spreadsheet for fitting equation 6.2 to data in table 6.1 using non-linear least squares.
45
Exercise 4
The Knox equation is widely used to represent the relationship between the plate
height H, and the velocity, v, of the mobile phase of a liquid chromatograph (LC)
55
.
The relationship may be written,

Cv
v
B
Av H + + =
3
1
(6.3)

where A, B and C are constants.

Table 6.2 shows LC data of plate height versus flow velocity for data published by
Katz et al (1983)
56
.

H (cm) v (cm/s)
0.004788 0.03027
0.003704 0.04527
0.003116 0.06507
0.002526 0.10023
0.002292 0.1306
0.002176 0.1653
0.002246 0.2488
0.002360 0.3185
0.002678 0.4792
0.002856 0.6028
Table 6.2: Data from Katz et al. (1983).

Use either linear or non-linear least squares to fit equation 6.3 to the data in table 6.2
and thereby obtain best estimates of A, B and C and the standard errors in the
estimates. [0.002509
3
1
3
2
s cm , 0.0001232 cm
2
/s , 0.0008720 s, 0.000185
3
1
3
2
s cm ,
3.12 10
-6
cm
2
/s, 0.000326 s].

55
See Kennedy and Knox, (1972).
56
The data were obtained with a benzyl acetate solute and a mobile phase of 4.48% (w/v) ethyl acetate
in n pentane.
46
Section 7: Weighted non-linear least squares
There are some occasions where the standard deviation of the errors in the y values is
not constant (i.e. errors exhibit heteroscedasticity). Such a situation may be revealed
by plotting residuals
57
, ( ) y y , versus x. If errors are heteroscedastic, then weighted
fitting is required. The purpose of weighted fitting is to obtain best estimates of the
parameters by forcing the line close to the data that are known to high precision, while
giving much less weight to those data that exhibit large scatter.

The starting point for weighted fitting using least squares is to define a sum of squares
of residuals that takes into account the standard deviation in the y values. We write,

|
|
.
|
\
|
=
2
2
i
i i
y y
(7.1)

We refer to
2
as the weighted sum of squares of residuals
58
.
i
is the standard
deviation in the ith y-value. The purpose of the weighted fitting is to find best
estimates of parameters that minimise
2
in equation 7.1.

If
i
is constant, as it is in unweighted fitting using least squares, equation 7.1 can be
replaced by equation 2.7. In this sense, equation 7.1 can be thought as the more
general formulation of least squares.

7.1 Weighted fitting using Solver
In order to establish best estimates of parameters using Solver when weighted fitting
is performed, we use an approach similar to that described in section 4. For weighted
fitting, an extra column in the spreadsheet containing the standard deviations
i
is
required. It is possible that the absolute values of
i
are unknown and that only
relative standard deviations are known. For example, equations 7.2 and 7.3 are
sometimes used when weighted fitting is required,

i i
y (7.2)

i i
y (7.3)
Weighted fitting can be carried out so long as,

the absolute standard deviations in values are known, or
the relative standard deviations are known.

In order to accomplish weighted non linear least squares, we proceed as follows:

1) Fit the desired equation to data by calculating
2
as given by equation 7.1. Use
Solver to modify parameter estimates so that
2
is minimised.
2) Determine the elements in the D matrix, as described in section 4.

57
See section 6.10 of Kirkup (2002).
58

|
|
.
|
\
|
2
i
i i
y y
follows a chi-squared distribution, hence the use of the symbol ,

2
.
47
3) Construct the weight matrix, W, in which the diagonal elements of the matrix
contain the weights to be applied to the y values.
4) Calculate the weighted standard deviation
w
, where
w
is given by,

2
1
2
|
|
.
|
\
|
=
p n
w
(7.4)

2
is given by equation 7.1, n is the number of data points and p is the number
of parameters in the equation to be fitted to the data.

5) Calculate the standard errors of the parameter estimates, given by
59

( ) ( ) [ ]
2
1
1
T
WD D B

=
w
(7.5)

B is the matrix containing elements equal to the best estimates of the
parameters.
w
is the weighted standard deviation, given by equation 7.4.

6) Calculate the confidence interval for each parameter appearing in the equation
at a specified level of confidence (usually 95%).

To illustrate steps 1 to 6, we consider an example of weighted fitting using Solver.

7.2 Example of weighted fitting using Solver
The relationship between the current, I, through a tunnel diode and the voltage, V
across the diode may be written
60
,

( )
2
V B AV I = (7.6)

A and B are constants to be estimated using least squares. Table 7.1 shows current
voltage data for a germanium tunnel diode.

V(mV) I (mA)
10 4.94
20 6.67
30 10.57
40 10.11
50 10.44
60 12.90
70 10.87
80 9.73
90 7.03
100 5.61
110 3.80
120 2.36
Table 7.1: Current- voltage data for a germanium tunnel diode.

59
See Neter et al. (1996).
60
See Karlovsky (1962).
48

Equation 7.6 could be fitted to data using unweighted non-linear least squares (in the
first instance it usually sensible to use an unweighted fit, as the residuals may show
little evidence of heteroscedacity and so there is little point in performing a more
complex analysis).

In this example we are going to assume that the error in the y quantity is proportional
to the size of the y quantity, ie equation 7.3 is valid for these data.

The data in table 7.12 are entered into a spreadsheet as shown in sheet 7.1 and is
plotted in figure 7.1.

Sheet 7.1: Data from table 7.1 entered into a spreadsheet.
A B
1 x(mV) y(mA)
2 10 4.94
3 20 6.67
4 30 10.57
5 40 10.11
6 50 10.44
7 60 12.90
8 70 10.87
9 80 9.73
10 90 7.03
11 100 5.61
12 110 3.80
13 120 2.36
14

Figure 7.1: Current voltage data for a germanium tunnel diode.

0
2
4
6
8
10
12
14
0 20 40 60 80 100 120 140
x (mV)
y
(
m
A
)

49

7.2.1 Best estimates of parameters using Solver
Consistent with symbols used in other analyses in this document, we rewrite equation
7.6 as,

( )
2
x b ax y = (7.7)

We can obtain a reasonable value for b, which we will use as a starting value, b
0
, by
noting that equation 7.7 predicts that y = 0 when x = b. By inspection of figure 7.1 we
see that when y = 0, x 130 mV, so that b
0
= 130 mV. Equation 7.7 is rearranged to
give,

( )
2
x b x
y
a
= (7.8)

An approximate value for a (which we take to be the starting value, a
0
) can be
obtained, by choosing any data pair from sheet 7.1 (say, x = 50 mV and
y = 10.44 mA) and substituting these into equation 7.8 along with b
0
= 130 mV. This
gives (to two significant figures) a
0
= 3.3 10
-5
.

Sheet 7.2 shows the cells containing the calculated values of current ( y ) in column C
based on equation 7.7. The parameter estimates are the starting values (3.3 10
-5
and
130) in cells D17 and D18. Column D of sheet 7.2 contains the weighted sum of
squares of residuals. The sum of these residuals appears in cell D14.

Sheet 7.2: Fitted values and weighted sum of squares of residuals before optimisation
occurs.
B C D
1

y(mA)
y
2
|
|
.
|
\
|
y
y y

2 4.94 4.752 0.001448311
3 6.67 7.986 0.038927822
4 10.57 9.9 0.004017905
5 10.11 10.692 0.003313932
6 10.44 10.56 0.000132118
7 12.90 9.702 0.061457869
8 10.87 8.316 0.055205544
9 9.73 6.6 0.103481567
10 7.03 4.752 0.105001811
11 5.61 2.97 0.221453287
12 3.80 1.452 0.381793906
13 2.36 0.396 0.692562482
14 sum 1.668796555
15
16 solver
17 a 3.30E-05
18 b 130

50
Running Solver (using the default settings see section 9.1) gives the output shown in
sheet 7.3.

Sheet 7.3: Fitted values and weighted sum of squares of residuals after optimisation
using Excels Solver.
B C D
1

y(mA)
y
2
|
|
.
|
\
|
y
y y

2 4.94 4.451251736 0.009788509
3 6.67 7.671430611 0.022541876
4 10.57 9.797888132 0.005335934
5 10.11 10.96797581 0.007201911
6 10.44 11.31904514 0.007089594
7 12.90 10.98844764 0.02195801
8 10.87 10.11353481 0.004843048
9 9.73 8.831658167 0.008524277
10 7.03 7.280169211 0.00126636
11 5.61 5.596419449 5.86015E-06
12 3.80 3.917760389 0.000960354
13 2.36 2.381543539 8.33317E-05
14 sum 0.089599066
15
16 from solver
17 a 2.289E-05
18 b 149.4440503

The weighted standard deviation is calculated using equation 7.4, ie
2
1
2
|
|
.
|
\
|
=
p n
w
=
2
1
2 12
6 0.08959906
|
.
|
\
|
= 0.09465678 (7.9)

7.2.2 Determining the D matrix
In order to determine the matrix of partial derivatives, we calculate

( ) [ ] [ ]
( ) a a
x b a y x b a y
a
y
i i
x b
i
i
+
+
|
.
|
\
|
1
, , , , 1
,
(7.10)

and

( ) [ ] [ ]
( ) b b
x b a y x b a y
b
y
i i
x a
i
i
+
+
|
.
|
\
|
1
, , , 1 ,
,
(7.11)

is chosen to be 10
-6
(see section 4.2). Sheet 7.4 shows the values of the partial
derivatives in the D matrix.

51

Sheet 7.4: Calculation of partial derivatives used in the D matrix.
E F G H I
1
y (b constant) y (a constant)

dy/da
dy/db
2 4.451256188 4.451261277 194446.4316 0.063842869
3 7.671438283 7.671448325 335115.2430 0.118528971
4 9.79789793 9.797912649 428006.4343 0.164058306
5 10.96798677 10.96800576 479120.0055 0.200430873
6 11.31905646 11.31907916 494455.9567 0.227646674
7 10.98845863 10.98848436 480014.2876 0.245705707
8 10.11354493 10.11357286 441794.9986 0.254607974
9 8.831666999 8.831696179 385798.0894 0.254353473
10 7.280176491 7.280205816 318023.5601 0.244942205
11 5.596425045 5.596453279 244471.4107 0.226374169
12 3.917764307 3.917790076 171141.6412 0.198649367
13 2.381545921 2.381567715 104034.2515 0.161767798
14
15
16 b constant a constant
17 2.289E-05 2.289E-05
18 149.4440503 149.4441997

7.2.3 The weight matrix, W
The weight matrix is a square matrix with diagonal elements proportional to
2
1
i
and
other elements equal to zero
61
. In this example,
i
is taken to be equal to y
i
, so that the
diagonal matrix is as given in sheet 7.5.

Sheet 7.5: Weight matrix for tunnel diode analysis (while the weights are shown to
only three decimal places, Excel retains all figures for the calculations).
C D E F G H I J K L M
N
24 0.041 0 0 0 0 0 0 0 0 0 0 0
25
0 0.022 0 0 0 0 0 0 0 0 0 0
26
0 0 0.009 0 0 0 0 0 0 0 0 0
27 0 0 0 0.010 0 0 0 0 0 0 0 0
28
0 0 0 0 0.009 0 0 0 0 0 0 0
29 0 0 0 0 0 0.006 0 0 0 0 0 0
30
0 0 0 0 0 0 0.008 0 0 0 0 0
31 0 0 0 0 0 0 0 0.011 0 0 0 0
32
0 0 0 0 0 0 0 0 0.020 0 0 0
33
0 0 0 0 0 0 0 0 0 0.032 0 0
34
0 0 0 0 0 0 0 0 0 0 0.069 0
35
0 0 0 0 0 0 0 0 0 0 0 0.180

61
For details on the weight matrix, see Neter et (1996).
D
W
52

7.2.4 Calculation of ( )
1
T
WD D

To obtain standard errors in estimates a and b, we must determine ( )
1
T
WD D

.
Sheet 7.6 shows the several steps required to determine ( )
1
T
WD D

. The steps consist
of:
a) Calculation of the matrix WD. The elements of this matrix are shown in
cells C37 to D48. (W is multiplied with D using the MMULT() function in
Excel).
b) Calculation of the matrix D
T
WD. The elements of this matrix are shown in
cell G37 to H38.
c) Inversion of the matrix, D
T
WD. The elements of the inverted matrix are
shown in cells G41 to H42.

Sheet 7.6: Calculation of ( )
1
T
WD D

.
B C D E F G H
37 WD 7967.94045 0.002616 D
T
WD 2.2728E+10 15410.18847
38 7532.55853 0.002664 15410.1885 0.013460574
39 3830.89566 0.001468
40 4687.5077 0.001961
41 4536.55955 0.002089 (D
T
WD)
-1
1.9662E-10 -0.0002251
42 2884.5279 0.001477 -0.0002251 331.9969496
43 3739.05374 0.002155
44 4075.06361 0.002687
45 6435.00139 0.004956
46 7767.87729 0.007193
47 11851.9142 0.013757
48 18678.9449 0.029045

7.2.5 Bringing it all together
To calculate the standard errors in a and b, the weighted standard deviation of the
mean (given by equation 7.4) is multiplied by the square root of the diagonal elements
of the ( )
1
T
WD D

matrix, i.e.

( )
2
1
10
10 1.9662

=
w a
= 0.09465678 ( )
2
1
10
10 1.9662

= 1.327 10
-6

and
( )
2
1
00 . 332
w b
= = 0.09465678 ( )
2
1
10
10 00 . 332

= 1.725

It follows that the 95% confidence intervals for A and B are,

A= a t
95%,
a
(7.12)

B = b t
95%,
b
(7.13)

t
95%,
is t value corresponding to the 95 % level of confidence and is the number of
degrees of freedom.

53
In this example, the number of degrees of freedom, = n p = 12 2 = 10. From
statistical tables
62
,

t
95%,10
= 2.228

It follows that (inserting units),

A= (2.29 0.30) 10
-5
mA/(mV)
3

B = (149.4 3.8) mV

Exercise 5
Equation 7.6 may be transformed into a form suitable for fitting by linear least
squares.

a) Show that equation 7.6 can be rearranged into the form,
V A B A
V
I
2
1
2
1
2
1
= |
.
|
\
|
(7.14)
b) Plot a graph of
2
1
|
.
|
\
|
V
I
versus V.
c) Use unweighted least squares to obtain best estimates of A and B and standard
errors in the best estimates
63
. [2.30 10
-5
mA/(mV)
3
, 149.8 mV,
2.0 10
-6
mA/mV
3
, 1.5 mV)
d) Why is it preferable to use non-linear least squares to estimate parameters
rather than to linearise equation 7.6 followed by using linear least squares to
find these estimates?

62
See, for example, Kirkup (2002) page 385.
63
Care must be exercised when calculating the uncertainty in the estimate of B as this requires use of
both slope and the intercept and these are correlated. For more information see Kirkup (2002),
page 232.
54
Section 8: Uncertainty propagation, least squares estimates and
calibration
Establishing best estimates of parameters in an equation may be the main purpose of
fitting an equation to experimental data. For example, in an experiment to study the
variation of resistance, R, with time, t, in a photoconductor, the primary purpose of the
fitting may be to obtain best estimates for the parameters A
1
, A
2
, B
1
and B
2

which
appear in equation 8.1
64
which represents a possible relationship between R and t.

( ) ( ) t B A t B A R
2 2 1 1
exp exp + = (8.1)

There are situations in which parameter estimates are used to calculate other
quantities of interest. A common example involves gathering x-y data for the purpose
of calibration. Once the best estimates of the parameters in the calibration equation
have been determined, the equation is used to find values of x from measured values
of y.

For example, if the relationship between x and y is

y = a + bx (8.2)

then for a given (mean) value of y, y , the corresponding value of x, x , can be
determined. This is done by rearranging equation 8.2 and replacing y by y and x by
x , so that

b
a y
x

= (8.3)

One approach to calculating the standard error in x is to assume that the errors in a, b
and y are uncorrelated. In this situation the standard error,
x
, is given by,

2
2 2

|
|
.
|
\
|
+ |
.
|
\
|
+ |
.
|
\
|
=
y b a x
y
x
b
x
a
x
(8.4)

Unfortunately, errors in the parameters estimates a and b are correlated
65
, so it is not
valid to use equation 8.4. To correctly determine
x
, we must account for that
correlation. We begin by determining the covariance matrix, V, given by,

V =
2
A
-1
(8.5)

Where
2
is the variance in the y values and A
-1
is the error matrix, as discussed in
section 2.

2
is found using,

64
Equation 8.1 represents a possible relationship between R and t (Kirkup L and Cherry I, 1988).
65
See Salter (2002).
55

( )
2
2
2
=

n
y y
i i
(8.6)

where
i i
bx a y + = , and n is the number of x-y data.

In this example, A
-1
is the inverse of the matrix, A, where,

A =
|
|
.
|
\
|

2
x x
x n
(8.7)

There is an economical way to determine the elements in the matrix, A, which is
especially efficient when using a computer package that allows for matrix
multiplication (such as Excel). A is written as,

A = X
T
X (8.8)

X
T
is the transpose of the matrix, X, where X is given by,

X =
|
|
|
|
|
|
|
|
.
|
\
|
n
i
x
x
x
x
x
1
1
1
1
1
3
2
1

(8.9)

If f is some function of a and b, then
66
,

f
T
f
Vd d =
2
f
(8.10)

where

|
|
|
|
.
|
\
|
=
b
f
a
f
f
d (8.11)

As V =
2
A
-1
, equation 8.10 can be rewritten as,

f
-1 T
f
d A d
2 2
=
f
(8.12)

8.1: Example of propagation of uncertainties involving parameter estimates
Equation 8.12 is applied to data gathered in an experiment which considers the
variation in pressure of a fixed mass and volume of gas as the temperature of the gas
changes. The data are given in table 8.1.We will use the data to estimate the value of

66
See Salter (2000).
56
the temperature at which the pressure of the gas is zero (this is termed the absolute
zero of temperature).

Table 8.1: Pressure versus temperature data.
(C) P (kPa)
-20 211
-10 218
0 224
10 238
20 247
30 251
40 259
50 265
60 277
70 288
80 294

Assume that the relation between pressure, P, and temperature, , can be written,

P = A + B (8.13)

Where A and B are parameters to be estimated using least squares.

We will determine,

a) best estimates for A and B (written as a and b respectively).
b) standard errors,
a
and
b
,

in a and b.
c) the intercept,
INT
, of the best line through the data on the temperature axis.

d) the standard error in
INT
, assuming errors in a and b are uncorrelated.

e) the standard error in
INT
, assuming errors in a and b are correlated.

Solution

a)
a and b may be determined in several ways, including using the LINEST() function in
Excel
67
. Applying the LINEST() function to the data in table 8.1 we obtain:

a = 226909 Pa
b = 836.36 PaC

b)
Using LINEST() in Excel to calculate
a
and
b
gives,

a
= 993.7 Pa

b
= 22.80 PaC

67
See page 228 of Kirkup (2002).
57
c)
The intercept,
INT
, on the temperature axis occurs when P = 0. Rearranging equation
8.13 gives,

B
A
INT
= (8.14)

The best estimate of
INT
, written as
INT
, is therefore,

b
a
INT
=
(8.15)

3 . 271
36 . 836
226909
= = C
d)
Assuming that errors in a and b are uncorrelated, the usual propagation of
uncertainties equation gives the standard error in
INT
,
INT
as
68
,

2 2

|
|
.
|
\
|
+
|
|
.
|
\
|
=
b
INT
a
INT
b a
INT
(8.16)

Now,

b a
INT
1
and (8.17)

2
b
a
b
INT
=
(8.18)

It follows that (using equation 8.16).

( )
2
2
2
80 . 22
36 . 836
226909
7 . 993
36 . 836
1
|
|
.
|
\
|
+ |
.
|
\
|
=
INT

= 7.49 C

It follows that,

( ) 5 7 3 271 . .
INT
= C
e)
In order to determine
INT
when the correlation between a and b is accounted for, we

write (following equation 8.7),

68
58
A =
|
|
.
|
\
|
=
|
|
.
|
\
|

20900 330
330 11
2
x x
x n

Inverting the matrix A is accomplished using the MINVERSE() function in Excel
69
.
This gives,

A
-1
=
|
|
.
|
\
|

5 -
10 9.09091 00272727 . 0
00272727 . 0 172727 . 0
(8.19)

To determine
INT
we use equation 8.12. It is convenient to rewrite equation 8.12 as,

INT INT INT

2 2
d A d
-1 T
= (8.20)

Now

( )
2
2
2
=

n
P P
i i
(8.21)

where

i i
b a P + =
(8.22)

Values for a and b appear in part a) of this question. Using those estimates and
equation 8.21, we find,

5717171
2
= (8.23)

From equation 8.11 and equation 8.15,
INT
d is given by,

|
|
|
|
.
|
\
|
=
|
|
|
|
.
|
\
|
=
2
b
a
b
b
a
i
i
INT
d (8.24)

Substituting a and b obtained in part a) of this question gives,

|
|
.
|
\
|
=
32439 . 0
0011957 . 0
INT
d

Returning to equation 8.20, we have,

( )
|
|
.
|
\
|
|
|
.
|
\
|

=

32439 . 0
0011957 . 0
10 09091 . 9 00272727 . 0
00272727 . 0 172727 . 0
32439 . 0 0011957 . 0 5717171
5
2
INT

69
See page 285 in Kirkup (2002).
59
so that,

20 . 68
2
=
INT
(C)
2
, or
26 . 8
=
INT
C

Now we write:

( ) 3 . 8 3 . 271 =
INT
C

This may be compared with
INT
obtained in part d) of this question when a and b are
assumed to be uncorrelated, ie:

( ) 5 7 3 271 . .
INT
= C

In this instance, failure to account for the correlation between a and b results in an
underestimation of the standard error in
INT
.

8.2 Uncertainties in derived quantities incorporating least squares estimates
Parameter estimates obtained using least squares, as well as other quantities that have
uncertainty, may be brought together to determine a derived quantity. The derived
quantity has an uncertainty which may be calculated. As an example, consider the
calibration line in figure 8.1 which is to be used to determine
o
x when y =
o
y (in an
analytical chemistry application,
o
y might represent the mean detector response of an
instrument and
o
x is the predicted concentration of the analyte corresponding to that
response).

Figure 8.1: Calibration line fitted to x-y data.

Assuming the relationship between x and y in figure 8.1 is linear, then,

o o
x b a y + = (8.25)

x

y

y
o

o
x
60
or,

b
a y
x
o
o
= (8.26)

a and b are determined using least squares. As
o
y is not correlated with a or b, we
write,

o o o o
x x y
o
o
x
y
x

2
2
2
d A d
-1 T
+
|
|
.
|
\
|
= (8.27)

From equation 8.26 we have,

b y
x
o
o
1
=

Also,

m
o
y
2
2

= (8.28)

where
2
is given by equation 8.6, and m is the number of repeat measurements made
of the detector response for a particular (unknown) analyte concentration.

8.3: Example of propagation of uncertainties in derived quantities
In section 8.1 we considered data from an experiment in which the variation in
pressure of a fixed mass and volume of gas was measured as the temperature of the
gas changes. We will use that data and the additional information that four repeat
measurements of pressure were made at an unknown temperature such that,

Mean pressure
o
P = 2.54 10
5
Pa,

Adapting equation 8.26, we have,

36 . 836
226909 10 54 . 2
5

=
=
b
a P
o
o
= 32.39 C

Using equation 8.28, and the value of
2
given in equation 8.23, we find
70

4
5717171
2
2
= =
m
P
= 1.429 10
6
(Pa)
2

Rewriting equation 8.27 in terms of the variables in this question gives,

70
The assumption is made here is that the scatter in the y values remains constant, such that the
estimate we make of the standard deviation in the y values during calibration is the same as that of the y
values obtained for the unknown x value.
61

o o o
P
o
o
P

2
2
2
d A d
-1 T
+
|
|
.
|
\
|
= (8.29)

20 . 68 10 429 . 1
36 . 836
1
6
2
+ |
.
|
\
|
=

= 2.04 + 68.20

so that,

38 . 8
=
o
C

Finally, we write,

o
= (32.5 8.4) C

8.4: Uncertainty propagation and nonlinear least squares
In general, parameter estimates obtained using non linear least squares are correlated.
Therefore, for derived quantities which incorporate parameter estimates, the
covariance matrix must be used to establish the standard errors in those quantities.
The first stage, as with any non linear fitting, is to minimise the sum of squares of
residuals, SSR, as described in sections 3 and 4.

Suppose f is a function of parameter estimates obtained through non linear least
squares. The variance in f,
2
f
, may be written,

f
-1 T
f
d E d
2 2
=
f
(8.30)

E
-1
is the inverse of the matrix, E, where,
71

E = D
T
D (8.31)

D is given by,

D =
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
n n n
i i i
2 2 2
1 1 1
(8.32)

71
See section 4.2.
62
and
72

|
|
|
|
|
|
.
|
\
|
=
c
f
b
f
a
f
f
d (8.33)

8.4.1: Example of uncertainty propagation in parameter estimates obtained by
nonlinear least squares
In many situations, calibration data exhibit a slight curvature and it is a matter of
debate whether it is appropriate to fit an equation of the form y = a + bx to the data.
As an example, consider the data shown in table 8.2 and also in figure 8.2.

Table 8.2: Area versus concentration data for biochanin.
Conc.
(x)
(mg/l)
Area, (y)
(arbitrary
units)
0.158 0.121342
0.158 0.121109
0.315 0.403550
0.315 0.415226
0.315 0.399678
0.631 1.839583
0.631 1.835114
0.631 1.835915
1.261 3.840554
1.261 3.846146
1.261 3.825760
2.522 8.523561
2.522 8.539992
2.522 8.485319
5.045 16.80701
5.045 16.69860
5.045 16.68172
10.09 34.06871
10.09 33.91678
10.09 33.70727

Close inspection of the data in figure 8.2 indicates that the relationship between Area
and Concentration is not linear, but shows a slight but definite curvature. There are
many candidates for the function that might be fitted to data, but we must be wary of
using a function with too many adjustable parameters (see section 10). We will fit the
function,

C
Bx A y + = (8.34)

72
Equations 8.32 and 8.33 are appropriate where there are three best estimates, a, b and c of the
parameters in the equation fitted to data. Both equations may be extended if the number of parameters
to be estimated exceeds three.

0

5

10

15

20

25

30

35

40

0

2

4

6

8

10

12

Concentration (mg/l)

0

5

10

15

20

25

30

35

40

0

2

4

6

8

10

12

Area (arbitrary units)

Figure 8.2: Calibration curve of area versus concentration
for biochanin.
63
to the data in table 8.2.

Applying non-linear least squares, the best estimates for A, B and C, represented by a,
b, and c, respectively are,

a = -0.5651
b = 3.581
c = 0.9790

When repeat measurements are made of the area under a chromatogram curve, the
mean area can be determined. Using this mean we may estimate the concentration of
the biochanin. We begin by rearranging equation 8.34, so that,

C
B
A y
x
1
|
.
|
\
|
= (8.35)

Substituting a, b and c, and
o
y , into equation 8.35, gives the estimate of x,
o
x , as,

c
o
o
b
a y
x
1
|
.
|
\
|
= (8.36)

As
o
y is not correlated with a, b or c, we can write,

o o o o
x x y
o
o
x
y
x

2
2
2
d E d
-1 T
+
|
|
.
|
\
|
= (8.37)
where,

|
|
|
|
|
|
.
|
\
|
=
c
x
b
x
a
x
o
o
o
o
x
d (8.38)
and,

( )
3
2
2
=

n
y y
i i
(8.39)

Partially differentiating
o
x in equation 8.36 with respect to a, b, c and
o
y respectively
gives,

|
.
|
\
|
|
.
|
\
|
=
c
c
o o
b
a y
bc a
x
1
1
(8.40)

64
|
.
|
\
|
|
.
|
\
|
=
c
o o
b
a y
bc b
x
1
1
(8.41)

|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|
b
a y
b
a y
c c
x
o
c
o o
ln
1
1
2
(8.42)

|
.
|
\
|
|
.
|
\
|
=
c
c
o
o
o
b
a y
bc y
x
1
1
(8.43)

After calibration, the area under the chromatogram curve is measured four times for a
sample of unknown concentration. It is found that,

15513 . 6 =
o
y (8.44)

Fitting using non-linear least squares gives,

a = -0.5651
b = 3.581
c = 0.9790

Substituting for a, b, c and
o
y in equation 8.36 gives the estimate of the unknown
concentration,
o
x , as

9790 . 0
1 1
581 . 3
5651 . 0 15513 . 6
|
.
|
\
| +
= |
.
|
\
|
=
c
o
o
b
a y
x = 1.902 mg/l

Substituting for a, b, c and
o
y into equations 8.40 to 8.43, we obtain,
289083879 0
. -
a
x
o
=
, 54244803 0
.
b
x
o
=
, 248914487 1
. -
c
x
o
=
, 289083879 0
.
y
x
o
o
=

65

Sheet 8.1 Shows the layout of a spreadsheet used to calculate
o
x and
o
x
.

Sheet 8.1: Annotated sheet showing calculation of
o
x and
o
x
.

A B C D
44
a -0.56512
a
0.088470
45
b 3.58138
b
0.078995
46
c 0.97901
c
0.009026
47

48
o
x
d

-0.28908
49
-0.54245
50
-1.24891
51
52 T
x
o
d

-0.28908 -0.542448 -1.248914
53

54
V 0.00783 -0.00601 0.000646
55
-0.00601 0.00624 -0.000705
56 0.00065 -0.0007 8.15E-05
57

58
V
o
x
d

0.00019
59
-0.00077
60 0.00009
61

62
o o
x
T
x
Vd d

0.00025
63
64
o
o
y
x

0.28908
65
2
|
|
.
|
\
|
o
o
y
x

0.08357
66 2
0.02985
67
2
o
y

0.00746
68
o
y
6.15513
69
m 4
70
o
x
1.90193
71 2
o
x

0.00087
72
o
x

0.02948

Best estimates of parameters
and standard errors in
parameters
|
|
|
|
|
|
.
|
\
|
=
c
x
b
x
a
x
o
o
o
o
x
d

|
.
|
\
|
|
.
|
\
|
=
c
c
o
o
o
b
a y
bc y
x
1
1

( )
3
2
2
=

n
y y
i i

m
o
y
2
2

=
V =
2
E
-1

c
o
o
b
a y
x
1
|
.
|
\
|
=
o o o o
x x y
o
o
x
y
x

2
2
2
d E d
-1 T
+
|
|
.
|
\
|
=
66
From sheet 8.1,
o
x and
o
x
are found to be:

o
x = 1.90193
o
x
= 0.029

which allows us to write: x = (1.902 0.029) mg/l

Exercise 6
The following data were obtained during the calibration of an HPLC system using
Ibuprofen. The area under the chromatograph peak is shown as a function of known
concentrations (expressed in mass/tablet) of Ibuprofen.

Table 8.3: Area under chromatograph peak as a function of concentration of
Ibuprofen.
Mass per
tablet/(mg/tablet)
Area
(arbitrary
units)
103.9 265053
103.9 261357
139.3 345915
139.3 345669
180.1 445684
180.1 445753
200.3 494700
200.3 493846
219.9 540221
219.9 539610
278.1 683881
278.1 683991
305.7 755890
305.7 754901

Using the data in table 8.3:

a) Fit an equation of the form
c
bx a y + = to the data in table 8.3, where y
corresponds to the area under the chromatograph peak and x corresponds to
Ibuprofen concentration. Determine a, b and c and their respective standards
errors.
b) A sample of Ibuprofen of unknown concentration is injected into the column
of the calibrated HPLC. The mean area of three replicates measurements is
found to be 405623. Use this information to estimate the concentration of
Ibuprofen and the standard error in the estimate of the concentration.
67
Section 9: More on Solver
Solver was devised primarily for use by the business community and this is reflected
in the features it offers. Solver comprises three optimisation algorthms:

1) For integer problems, Solver uses the Branch and Bound method
73
.
2) Where equations are linear, the Simplex method is used for optimisation
74
.
3) In the case of non-linear problems, the General Reduced Gradient (GRG)
method is adopted
75
.

It is the GRG method that is applied in our analyses therefore most of this section is
devoted to describing features of Solver that relate to this.

Though optimisation can be carried out successfully with the default settings in the
Solver Option dialog box, Solver possesses several options that can be adjusted by the
user to assist in the optimisation process and we will describe those next.

The Solver dialog box, as shown in figure 4.2 offers the facility to constrain
parameters estimates. The application of constraints requires careful consideration as
it is possible that Solver will locate a local minimum, rather than a global minimum.
The best estimates returned by Solver need to be compared with physical reality
before being accepted. Consider an example in which a parameter in an equation
represents the speed of sound, v, in air. If, after fitting, the best estimate of v is
-212 m/s, it is fair to question whether this value is reasonable. If it is not, then one
course of action it to try new starting values for the parameter estimates. We could use
the Constraints box in Solver to constrain the estimate of v so that it cannot take on
negative values. This cannot guarantee that a physically meaningful value will be
found for v, only that the value will be non-negative.

9.1 Solver Options
To view Solver options shown in figure 9.1, it is necessary to click on the Option
button in the Solver dialog box. This dialog box may be used to modify, for example,
the methods by which the optimisation takes place. This, in turn, may provide for a
better fit or reduce the fitting time over that obtained using the default settings.

73
See Wolsey L A, (1998).
74
See Nocedal, J, (1999).
75
See Smith and Lasden, (1992).
68

Figure 9.1: Solver Options dialog box illustrating default settings.

We now consider some of the options in the Solver Options dialog box.

Max Time: This restricts the total time Solver spends searching for an optimum
solution. Unless there are many data, the default value of 100 s is
usually sufficient. If the maximum time is set too low, such that
Solver has not completed its search, then a message is returned 'The
maximum time limit was reached; continue anyway?'. Clicking on the
Continue button will cause Solver to carry on searching for a solution.
Iterations: This is the maximum number of iterations that Solver will execute
before terminating its search. The default value is 100, but this can be
increased to a limit of 32,767. Solver is likely to find an optimum
solution before reaching such a limit or return a message that an
optimum solution cannot be found. If the number of iterations is set
too low, such that Solver has not completed its search, then a message
will be returned 'The maximum iteration limit was reached; continue
anyway?'. Clicking on the Continue button will cause Solver to carry
on searching for a solution.
Precision and
Tolerance:
These options are applicable to situations in which constraints have
been specified. Specifying constraints is not advised and so we will
not consider these options.
Convergence: As fitting proceeds, Solver compares the most recent solution (for our
application this would be the value of SSR) with previous solutions. If
the fractional reduction in the solution over five iterations is less than
the value in the Convergence box, Solver reports that optimisation is
complete. If this value is made very small (say, 10
-6
) Solver will
continue iterating (and hence take longer to complete) than if that
number is larger (say, 10
-2
).
Assume
Linear Model
If this box is ticked then Solver uses the Simplex method to obtain
best estimates of parameters. If the model to be fitted to data is linear,
then fitting may be performed using the Regression Tool in the
Analysis ToolPak. This is an attractive alternative as the Regression
Tool returns best estimates, standard errors in estimates, confidence
intervals and sum of squares of residuals. If the 'Assume Linear
69
Model' box is ticked, Solver will attempt to establish if the model is
indeed linear. If Solver determines that the model is non-linear, the
message is returned 'The conditions for Assume Linear Model are not
satisfied'. To continue, it is necessary to return to the Solver Option
dialog box and untick the Assume Linear Model option.
Assume Non-
Negative
This constrains all estimates in an equation so that they cannot take on
negative values.
Use
Automatic
Scaling
In certain problems there may be many orders of magnitude
difference between the data, the parameter estimates and the value in
the target cell. This can lead to rounding problems owing to the finite
precision arithmetic performed by Excel. If the 'Use Automatic
Scaling' box is ticked, then Solver will scale values before carrying
out optimisation (and 'unscale' the solution values before entering
them into the spreadsheet). It is advisable to tick this box for all
problems.
Show
Iteration
Results
Ticking this box causes Solver to pause after each iteration, allowing
new estimates of parameters and the value in the Target cell to be
viewed. If parameter estimates are used to draw a line of best fit
through the data, then the line will be updated after each iteration.
Updating the fitted line on the graph after each iteration gives a
valuable insight into the progress made by Solver to find best
estimates of the parameters in an equation.
Estimates:
Tangent or
Quadratic
This determines the method used to find subsequent values of each
parameter estimate at the outset of the search (ie either linear or
quadratic extrapolation). Both methods produce the same final results
for the examples described in this document.
Derivatives:
Forward or
Central
The partial derivatives of the function in the target cell with respect to
the parameter estimates are found by the method of finite differences.
It is possible to perturb the estimates 'forward' from a particular point
(similar to that described in section 4.3) or to perturb the estimates
forward and backward from the point in order to obtain a better
estimates of the partial derivatives. Both methods of determining the
partial derivatives produce the same final results for the examples
described in this document.
Search:
Newton or
Conjugate
Specifies the search algorithm. Reference to the Quasi - Newton and
Conjugate search methods used by Excel can be found in Safizadeh
and Signorile (1993) and Perry (1978) respectively. Both methods of
produce the same final results for the examples described here.
Load Model
and Save
Model
It is possible that the effect on optimisation of using a combination of
options such as Tangent (Estimates), Central (Derivatives) and
Conjugate (Search methods) would like to be considered. It is tedious
to record which fitting conditions have been used. Excel offers the
facility to store the option by clicking on Save Model, followed by
specifying the cells on the spreadsheet where the Model conditions
should be saved. These conditions can be recalled by clicking on
Load Model and indicating the cells which contain the saved
information.
70

9.2 Solver Results
Once Solver completes optimisation, it displays the Solver Results dialog box shown
in figure 9.2.

Figure 9.2: Solver Results dialog box.

Clicking on OK will retain the solution found by Solver (ie the starting parameters are
permanently replaced by the final parameter estimates). At this stage Excel is able to
present three reports: Answer, Sensitivity and Limits. Of the three reports, the
Answer report is the most useful as it gives the starting values of the parameters
estimates and the associated SSR. The report also displays the final parameter
estimates and the final SSR, allowing for easy comparison with the original values. An
Answer report is shown in figure 9.3.

Figure 9.3: Answer report created by Excel.

71
Section 10: Modelling and Model Identification
There are several types of model that interest physical scientists. Physical and
chemical models are based on the application of physical and chemical principles.
Such principles are expected to have wide applicability and underlie phenomena
observed inside and outside the laboratory. Equations founded on physical and
chemical principles contain parameters that have physical meaning rather than simply
being anonymous constants in an equation. For example, a parameter in an equation
could represent the radius of the Earth, the energy gap of a semiconductor or a rate
constant in a chemical reaction.

There are also essentially statistically based models that may, through consideration
of experimental or observational data, assist in identifying the important variables and
lend support to an empirical relationship between variables. A useful empirical
equation is one that successfully describes the trend in the data but is not derived from
a consideration of the fundamental principles underlying the relationship between
variables.

While both types of modelling are useful, most scientists would prefer the insight and
predictive opportunities offered by good physical models to those that have purely
statistical basis or support.

10.1 Physical Modelling
If a model based on physical and chemical principles is successful, in the sense that
data gathered in experiments are consistent with the predictions of the model, then
this lends support to the validity of the underlying principles.

As an example, a physical principle described by Isaac Newton, is that an attractive
force exists between all bodies. That attractive force is termed the gravitational force.
Newton went on to indicate how the gravitational force between two bodies depends
on their respective masses and the separation between the bodies. From this starting
point, it is possible to predict the value of acceleration of a body when it is allowed to
fall freely above the Earths surface. It is often the case that approximations are made
so that the problem does not become too complicated
76
. In this example we might
consider the Earth to be
77
:

a) a perfect sphere
b) not rotating
c) of uniform density

Once a prediction has been made as to how the acceleration of a body varies with
distance above the Earths surface, the next step is to determine by careful
measurement how the acceleration actually depends on distance. If the
approximations given by a), b) and c) above are valid, then the relationship between
free fall acceleration, g(h) and height, h, can be written:

76
Experienced physical scientists are able to simplify complex situations while retaining the key
principles necessary to understand a particular physical process or phenomenon.
77
If it found that the data are inconsistent with the simplified theory, the approximations may have to
be revisited and the model revised.
72
( )
2
0
1 |
.
|
\
|
+
=
R
h
g
h g (10.1)

where g
0
is the acceleration caused by gravity at the Earths surface (i.e. when h = 0),
and R is the radius of the Earth
78
.

By gathering data of acceleration as a function of height, it should be possible to
confirm or contest the validity of equation 10.1. It is also possible to infer from
equation 10.1 that if the range of h values is too limited (much less than the radius, R)
then the acceleration, g(h), will decrease almost linearly with height
79
. Additionally,
as the radius of the Earth is one of the parameters to be estimated, this can be
compared with the known radius of the Earth as determined by other methods.

Applying physical principles in order to establish an equation that successfully relates
the variables is challenging. However, such an equation is often more satisfying and
have wider applicability than an empirical equation.

10.2 Data driven approach to discovering relationships
As an alternative to a physical principles approach to developing a relationship
between physical variables, we could try a data driven approach such that trends
observed in the data suggest a relationship between dependent and independent
variables that might be valid. One weakness of this approach is that, even if the
correct functional relationship between acceleration and height is discovered, we
would be unlikely to recognise that hidden within parameter estimates is an important
physical constant, such as the radius of the Earth.

For example, with respect to study involving gravity described in section 10.1, we
might carefully gather experimental data of the acceleration of free fall, g(h) for
various heights, h, then plot g(h) versus h in order to discern the type relationship
between the two variables. Such a plot is shown in figure 10.1 for values of h in the
range 0 to 20 km.

78
See Walker (2002), Chapter 12.
79
This can be shown by doing a binomial expansion of equation 10.1 (see problem 11 at the end of the
article).
73
9.740
9.750
9.760
9.770
9.780
9.790
9.800
9.810
9.820
0 5000 10000 15000 20000
height (m)
g

(
m
/
s
2
)

Figure 10.1: Variation of acceleration due to gravity with height above the Earths
surface

Based on the data appearing in figure 10.1, there is a relationship between g(h) and h,
but owing to the variability within the data and perhaps the limited range over which
the data were gathered, it is difficult to justify fitting an equation other than y = a + bx
to these data.

10.3 Other forms of modelling
In the physical sciences we are often able to isolate and control important independent
variables in an experiment. For example, past experience may suggest that the
thickness of an aluminium film vacuum deposited onto a glass substrate is affected by
the distance from the aluminium target to the substrate, the deposition time and the
pressure of the gas in the vacuum chamber. Such isolation and control might be
contrasted with situations often encountered in other areas of science (and in other
disciplines, such as the health or medical sciences).

Consider, as an example, the efficacy of a treatment in prolonging the life of a patient
suffering with liver cancer. There may be many variables that affect patient longevity
to be considered including patient age, sex, race, past medical history, family medical
history and socio-economic status. In fact, identifying which are the most important
variables may be the finest achievement of the modelling/data analysis process with
little expectation that a functional relationship other than linear will emerge between
independent and dependent variables.

There are many areas of science in which a certain amount of data mining or
prospecting is required to establish which variables are most important and which
can be safely discarded. Here we will confine our considerations to the analysis of
data which emerge from experiments in which independent variables can be carefully
controlled and measured.

10.4 Competing models
Whether equations relating variables have been developed by first considering
physical principles, past experience, or intelligent guesswork, there are circumstances
in which two or more equations compete to offer the best explanation of the
relationship between the variables. More terms can be added to an equation (including
74
terms that introduce extra independent variables) until the fit between equation and
data is optimised, as measured by some suitable statistic such as those described in
section 10.5.

Careful experimental design can assist in helping discriminate one equation from
another. For example, if a model predicts a slightly non linear relationship between
dependent and independent variables, it would be wise to make measurements over as
wide a range of values of the independent variable as possible to expose or exaggerate
that non-linearity. Additionally, if the data show large scatter, there may be merit in
investigating ways by which the noise can be reduced in order to improve the quality
of the data.

In the situations in which we need to compare two or more equations, we can appeal
to methods of data analysis to provide us with quantifiable means of distinguishing
between models. It is these methods that we will concentrate upon for the remainder
of this section.

10.5 Statistical Measures of Goodness of Fit
There are several measures that can be used to assist in discriminating statistically
which equation gives the best fit to data, including the Schwartz criterion, Mallows
C
p
and the Hannan and Quinn Information Criterion
80
. Here we focus on two criteria,
the Adjusted Coefficient of Multiple Determination,
2
ADJ
R , and the Akaike
Information Criterion (AIC) as they are quite easy to implement and interpret.

10.5.1 Adjusted Coefficient of Multiple Determination
A measure of how well an equation is able to account for the relationship between the
independent and dependent variable is given by the Coefficient of Multiple
Determination, R
2
, given by,

( )
( )
=
2
2
2
1
i
i i
y y
y y
R (10.2)

where y
i
is the ith observed value of y,
i
y is the predicted y value found using the
equation representing the best line through the points and y is the mean of the
observed y values. Note that the numerator in the second term of equation 10.2 is the
sum of squares of residuals SSR.

As more parameters (or independent variables) are added to the model we would
expect SSR to reduce. As a consequence, R
2
would tend to unity. If we were to use R
2

to help choose between equations, for example between,

y = a + bx (10.3)
and
y = a + bx + cx
2
(10.4)

then equation 10.4 would always be favoured over equation 10.3 owing to the extra
flexibility the x
2
term provides for the line of best fit to pass close to the data points.

80
See Al-Subaihi (2002).
75

While the extra term in x
2
contributes to a reduction in SSR, is possible that the
reduction is only marginal. It seems reasonable that, while looking for an equation
that reduces SSR, account should also be taken of the number of parameters so as not
to unfairly discriminate against equations with only a small number of parameters.
One such statistic is the Adjusted Coefficient of Multiple Determination,
2
ADJ
R , is
given by,
81

( ) ( )
M n
M R n
R

=
1 1
2
2
ADJ
(10.5)

where R
2
is given by equation 10.2, n is the number of data and M is the number of
parameters in the equation.

The equation that is favoured when two or more equations are fitted to data, is that
equation that gives the largest value for
2
ADJ
R .

10.5.2 Akaikes Information Criterion (AIC)
Another way to compare two (or more) equations fitted to data, where the equations
have different numbers of parameters is to use the Akaikes Information Criterion
82

(AIC). This criterion takes into account SSR, but also includes a term proportional to
the number of parameters used. AIC may be written,

M SSR n 2 ln AIC + = (10.6)

where n is the number of data and M is the number of parameters in the equation.

The second term on the right hand side of equation 10.6 can be considered as a
penalty term. If the addition of another parameter in an equation reduces SSR then
the first term on the right hand side of equation 10.6 becomes smaller. However the
second term on the right hand side increases by two for every extra parameter used. It
follows that a modest decrease in SSR which occurs when an extra term is introduced
into an equation may be more than offset by the increase in AIC by using another
parameter. We conclude that, if two or more equations are fitted to data, then the
equation producing the smallest value for AIC is preferred.

Care must be exercised when calculating SSR as, if a transformation is required to
facilitate fitting, data must be transformed back to the original units before calculating
SSR, otherwise it is not possible to compare equations using
2
ADJ
R or AIC.
Additionally, if weighted fitting is to be used, then the same weighting of the data
must be used for all equations fitted to data.

81
See Neter, Kutner, Nachtsheim and Wasserman for a discussion of equation 10.5.
2
ADJ
R is calculated
by the Regression Tool in the Analysis ToolPak in Excel (see p 373 of Kirkup).
82
See Akaike (1974).
76
10.5.3 Example
As part of a study into the behaviour of electrical contacts made to a ceramic
conductor, the data in table 10.1 were obtained for the temperature variation of the
electrical resistance of the contacts.

Table 10.1: Resistance versus temperature for electrical contacts on a ceramic.
T (K) R() T (K) R()
50 4.41 190 0.69
60 3.14 200 0.85
70 2.33 210 0.94
80 2.08 220 0.78
90 1.79 230 0.74
100 1.45 240 0.77
110 1.36 250 0.68
120 1.20 260 0.66
130 0.86 270 0.84
140 1.12 280 0.77
150 1.05 290 0.75
160 1.05 300 0.86
170 0.74
180 0.88

These data are shown plotted in figure 10.2.

0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0 50 100 150 200 250 300 350
Temperature (K)
R
e
s
i
s
t
a
n
c
e

(
o
h
m
s
)

Figure 10.2: Resistance versus temperature data for electrical contacts made to a
ceramic material.

It is suggested that there are two possible models that can be used to describe the
variation of the contact resistance with temperature.

Model 1
The first model assumes that contacts show semiconducting behaviour, where the
relationship between R and T can be written,

|
.
|
\
|
=
T
B
A R exp (10.7)

77
where A and B are constants.

Model 2
Another equation proposed to describe the data assumes an exponential decay of
resistance with increasing temperature of the form,

( ) + = T R exp (10.8)

where , and are constants.

We will use the adjusted coefficient of multiple determination and the Akaikes
information criterion to determine whether equation 10.7 or equation 10.8 better fits
the data.

Solution
Both equation 10.7 and equation 10.8 were fitted using non-linear least squares. It is
possible to linearise equation 10.7 by taking logarithms of both sides of the equation,
then performing linear least squares. However it is more convenient to use the Solver
utility in Excel to perform non-linear least squares, as described in sections 4 of this
document.

Summarised in table 10.2 are the results of the fitting. Note that the number of data in
table 10.1, n = 26.

Table 10.2: Parameter estimates and statistics obtained when fitting equations 10.7
and 10.8 to the data in table 10.1.
Parameter estimates
and other statistics
Fitting
|
.
|
\
|
=
T
B
A R exp

Fitting ( ) + = T R exp
A,
A
0.4849, 0.0175 -
B,
B
111.0, 2.40 -
,

- 18.91, 2.14
,
- 0.03391, 0.00196
,
- 0.7974, 0.0313
SSR 0.2709 0.3191
AIC -29.95 -23.70
R
2
0.9859 0.9833
2
ADJ
R
0.9853 0.9818

Inspection of table 10.2 reveals that the equation
|
.
|
\
|
=
T
B
A R exp is superior to
( ) + = T R exp as judged by AIC and
2
ADJ
R . In this example the SSR is smaller
for equation 10.7 fitted to data compared to equation 10.8. As the number of
parameters in equation 10.7 is less than that in equation 10.8, this would have been
enough to encourage us to favour equation 10.7 as the better fit to data.
78
Section 11: Monte Carlo simulations and least squares
How effective is the technique of least squares at providing good estimates of
parameters appearing in an equation fitted to experimental data? This question is both
challenging and important. To begin with it is not possible to assure that an equation
fitted to data is appropriate. Additionally, we cannot be sure that the assumptions
usually made when applying the technique of least squares (e.g. that errors in the y
values are normally distributed, with a mean of zero and a constant standard
deviation) are valid.

There is no way to be certain of what the parameters should be that appear in any
equation that is fitted to real data. However, it is possible to contrive a situation
where we do know the underlying relationship between the dependent variable (y) and
independent variable (x) and how errors are distributed.

The starting point is to generate noise free y values in some range of x values. The
next stage is to add noise of known standard deviation with the aid of a random
number generator
83
.

Data generated in this manner are submitted to a least squares routine which, in turn,
calculates the best estimates of parameters appearing in an equation fitted to the data.
The estimates are compared with the actual parameters allowing the error in the
estimates
84
to be determined. Generating and analysing data in this way manner is an
example of a Monte Carlo simulation. Such simulations are widely used in science to
imitate situations that are too difficult, costly or time consuming to investigate
through conventional experiments.

The Monte Carlo approach is powerful and versatile. As examples, we may
investigate experimentally,

the performance of data analysis tools (for example, the speed and accuracy
of rival algorithms for non-linear least squares can be compared).
the consequence of choosing different sampling regimes (for example, the
distribution of parameter estimates obtained when measurements are made at
evenly spaced intervals of x can be compared with the distribution of
parameter estimates obtained when replicate measurements are made at
extreme values of x).
the effect of homo- or heteroscedasticity on parameter estimates (for example,
the consequences may be investigated of fitting an equation by unweighted
least squares to data, where data have been influenced by heteroscedastic
noise).
the effect of the magnitude of the noise in the data on the standard errors of
the parameter estimates.

83
or a pseudorandom number generator as routinely found in statistic and spreadsheet packages.
84
error = true value of parameter estimated value of parameter.
79
11.1 Using Excels Random Number Generator
The Random Number Generator in Excel offers a convenient means of adding
normally distributed noise to otherwise noise free data
85
. The Random Number
Generator is one of the tools in the Analysis ToolPak. The ToolPak is found by going
to the Tools pull down menu on the Menu toolbar and clicking on Data Analysis.

Figure 11.1 shows noise free y values in column B generated using the equation:

y = 3 + 1.5x (11.1)

Normally distributed noise with mean of zero and standard deviation of two is
generated in the C column. In the D column the noise-free y-values are summed with
the noise. The x-values are distributed evenly in the range x = 5 to x = 20.

A B C D
1 x y
noise_free
noise y
2 5 10.5 -0.00145 =B2+C2
3 6 12.0 -3.08168
4 7 13.5 -2.95189
5 8 15.0 0.06251
6 9 16.5 0.674352
7 10 18.0 2.402985
8 11 19.5 2.987836
9 12 21.0 3.183932
10 13 22.5 -1.49775
11 14 24.0 0.441671
12 15 25.5 2.453826
13 16 27.0 -1.03515
14 17 28.5 -1.77266
15 18 30.0 -0.43274
16 19 31.5 1.972357
17 20 33.0 0.010632
Figure 11.1: Normally distributed noise with zero mean and standard deviation of two
added to y values. x values are in the range x = 5 to x = 20, with no replicates.

Figure 11.2 shows a similar range of x values, but in this case eight replicates are
made at x = 5 and another eight at x = 20, with no values between these limits (such
that the number of data in figures 11.1 and 11.2 are the same). Again, normally
distributed noise with mean of zero and standard deviation of two is added to each of
the y-values.

85
The Random Number Generator allows noise with distributions other than normal to be added to
data. We will consider only normally distributed noise.
80

K L M N
1 x y
noise_free
noise y

2 5 10.5 -0.51868 =L2+M2
3 5 10.5 0.849941
4 5 10.5 0.91623
5 5 10.5 0.153223
6 5 10.5 1.798417
7 5 10.5 -0.67711
8 5 10.5 -2.12328
9 5 10.5 -3.16988
10 20 33.0 -0.11841
11 20 33.0 0.47371
12 20 33.0 -2.95645
13 20 33.0 1.563158
14 20 33.0 2.03966
15 20 33.0 0.874793
16 20 33.0 1.679191
17 20 33.0 -1.71295
Figure 11.2: Normally distributed noise with zero mean and standard deviation of two
added to y values. Data are generated at x = 5 and x = 20. Eight replicate y values are
generated at each x value.

Analysing the data shown in figures 11.1 and 11.2 using unweighted least squares
gives the following estimates for parameters and standard errors in parameters. Note
we refer to the data that are evenly distributed between x = 5 and x = 20 as given in
Figure 11.1 as Even dist. and the data consisting of replicates at x = 5 and x = 20 as
Extreme dist.. The outcome of analysing using unweighted least squares is shown in
table 11.1.

a
a
b
b
R
2
Even dist. 2.238 1.464 1.578 0.1099 0.9364
Extreme dist. 2.461 0.8297 1.538 0.05692 0.9812
Table 11.1: Parameter estimates and statistics for data in figures 11.1 and 11.2 found
using unweighted least squares.

Errors in intercept and slope in table 11.1 are found by subtracting the estimates from
the true values (3 and 1.5 respectively) as shown in table 11.2.
Error in a Error in b

Even dist. 0.7615 -0.07801
Extreme dist. 0.5386 -0.03845
Table 11.2: Errors in intercept and slope.

It is possible that the simulated data are unrepresentative of the effect of evenly
distributed data compared to data gathered at extreme x values (as there is only two
sets of data and, by chance, the Extreme dist. could have been favoured over the
Even dist.). This is where the power of the Monte Carlo approach emerges. The
simulation may be repeated many times in order to establish whether designing an
experiment with replicate measurements made at extreme x values does consistently
produce parameter estimates with smaller standard errors.

81
Figures 11.3 and 11.4 show histograms of estimates value of a and b which were
determined by generating 50 sets of simulated data, based on adding noise of standard
deviation of 2 to y values generated using equation 11.1.

0
2
4
6
8
10
12
14
0
.
0
1
.
0
2
.
0
3
.
0
4
.
0
5
.
0
6
.
0
7
.
0
estimate, a, of intercept
f
r
e
q
u
e
n
c
y
even dist.
extreme dist.

Figure 11.3: Histogram consisting of 50 estimates of intercept found by fitting the
equation y = a + bx to simulated data.

0
5
10
15
20
25
1
.
2
1
.
3
1
.
4
1
.
5
1
.
6
1
.
7
1
.
8
estimate, b , of slope
f
r
e
q
u
e
n
c
y
even dist.
extreme dist.

Figure 11.4: Histogram consisting of 50 estimates of slope found by fitting the
equation y = a + bx to simulated data.

Figures 11.3 and 11.4 provide convincing evidence of the benefits (as far as reducing
standard errors in parameter estimates is concerned) of designing the experiments in
which extreme x values are favoured. This finding has a sound foundation based on
statistical principles. For example, the standard error in the estimate of the slope is
related to the x-values, x
i
by,
86

( ) ( )
2
1
2

=
x x
i
b
(11.2)

86
See Devore, 1991.
82

where x is the mean of the x values and is the standard deviation of the
experimental y values given by,

( )
2
1
2
2
=

n
y y
i i
(11.3)

where n is the number of data.

Equation 11.2 indicates that, for a fixed ,
b
become smaller for large deviations of x
from the mean, i.e. for large values of x x
i
.

It is worth emphasising that the reduction of the standard errors and improved R
2
are
secured at some cost. What if the underlying relationship between x and y is not
linear? Gathering data at two extremes of x has assumed that the data are linearly
related and there is no way to test the validity of this assumption with the data
gathered in this manner.

11.2 Monte Carlo simulation and non-linear least squares
Let us now consider a situation requiring fitting by non-linear least squares. The
equation to be fitted to data is given by,

( ) ( ) x B A x B A y
2 2 1 1
exp exp + = (11.4)

We choose (arbitrarily),

A
1
= 50, A
2
= 50, B
1
= -0.025 and B
2
= -0.010

Fifty values of y are generated in the range x = 1 to x = 200. A graph of the noise free
data with y values calculated at equal increments of x beginning at x = 1 is shown in
figure 11.5.

Figure 11.5: Noise free data generated using equation 11.4.

0
20
40
60
80
100
120
0 50 100 150 200
x
y
83
Next, noise is added with mean of zero and a constant standard deviation of unity
(again chosen arbitrarily). The question arises: what values of x should be chosen to
obtain estimates of A
1
, A
2
etc which have the smallest standard errors?

With normally distributed noise of zero mean and standard deviation of unity added to
the y values, the graph looks typically like that shown in figure 11.6.

Figure 11.6: x y data as shown in figure 11.5 with noise added.

50 replicate data sets were generated with noise added to the y values in figure 11.5.
Upon the generation of each set, an equation of the form,

( ) ( ) x b a x b a y
2 2 1 1
exp exp + = (11.5)

where a
1
, a
2
, b
1
and b
2
are estimates of A
1
, A
2
, B
1
and B
2
respectively, was fitted to the
data using non-linear least squares. Note that starting values for the non-linear fit,
which are very important when fitting a function consisting of a sum of exponentials,
were a
1
= 50, a
2
= 50, b
1
= -0.025 and b
2
= -0.010.

Figure 11.7 shows a histogram of the a
1
parameter estimates.

0
20
40
60
80
100
120
0 50 100 150 200
x
y
84

Figure 11.7: Distribution of parameter estimate a
1
.

Exercise 7
An alternative sampling regime to that used in section 11.2 is to choose smaller
sample intervals in the region where the y values are changing most rapidly with x. A
sampling regime that has this characteristic is given by,

( )
|
|
.
|
\
|
+
+
=
1
1
ln
1
i N
N
x
i
(11.6)

N is the total number of data, and is a constant which is determined by letting
i
x
equal the maximum x value, when i = N.

Repeat the example given in section 11.2 (i.e. use the same starting equation and
distribution of errors) using the new sampling regime described by equation 11.6 and
perform 50 replicates. Plot a histogram of the distribution of the parameter
estimate a
1
.

a) Is the standard deviation of the parameter estimates, a
1
, less than that obtained
in section 11.2 when the x
i
values were evenly distributed?
b) Carry out an F test to compare the variances of the distribution of a
1
obtained
using both sampling regimes to establish if the difference in the characteristic
width is statistically significant
87
.

Exercise 8
In an experiment to determine the wavelength, , of an ultrasonic wave, an
experiment is to be performed which exploits the phenomenon of interference of
waves from two sources of ultrasound.

87
See pages 342 to 346, and pages 369 to 371 in Kirkup (2002).

0
2
4
6
8
10
12
14
16
20 30 40 50 60 70 80 90
parameter estimate, a 1
F
r
e
q
u
e
n
c
y

85
The relationship between the separation, y, between two successive maxima of the
interfering waves and the separation of the sources of the waves, d, is given by:

d
D
y

= (11.7)

where D is a constant.

Equation 11.7 is of the form y = bx, where, x
d
1
and b D.

How may values of d be chosen so as to minimise the standard error in the slope, b?

Simulation
Two approaches for choosing values of d are to be compared. The first approach
generates y values as d is increases from 1 to 20 cm in steps of 1 cm. The other is to
generate y values as the ratio 1/d is increased from 1/20 to 1 (ie 0.05 to 1).

Taking equal to 0.76 cm and D =50 cm, generate simulated values of y using
equation 11.7:

a) for d = 1 to 20 cm, in steps of 1 cm.
b)
for 1/d = 1/20 cm
-1
to 1 in steps of 0.05 cm
-1
.

For data generated by both methods a) and b), use Excels Random Number generator
to add normally distributed noise with mean of zero and standard deviation of unity.

Analysis
Use the LINEST() function in Excel to find the slope of the best line through the
origin for two sets of data generated. Replicate the simulation and least squares
analysis fifty times and construct a histogram showing the distribution of best
estimates of slope based on both distributions of x values.

Questions
Is there an obvious difference between the distributions of the parameter estimates
based on the two sampling regimes?

a) Support your answer to a using an F test to compares variances in the slope.
b) Do you foresee any practical problems when a real experiment is to be
carried out using either sampling regimes?

86
11.3 Adding heteroscedastic noise using Excels Random Number Generator
When the standard deviation of measurement is not constant, but instead depends on
the x value, the distribution of errors is said to be heteroscedastic. As far as fitting an
equation to data using least squares is concerned, it is necessary to use weighted
fitting
88
. Heteroscedasticity may be revealed by plotting the residuals versus x as
shown in figure 11.8.

Figure 11.8: Residuals indicating weighted fit is required: The trend of large residual
to small (or small to large) as x increases is a strong indication of heteroscedastic error
distribution.

Though heteroscedasticity may be revealed by a plot of residuals, the nature of the
heteroscedasticity is not always clear. For example, when the dominant source of
error is due to instrumental error, it is common for the error, e
i
to be proportional to
the magnitude of the response, y
i
.

We can use a Monte Carlo simulation to study the effect of heteroscedasticity, and to
establish (for example) the consequences of fitting an equation to data with
heteroscedastic errors using both unweighted and weighted least squares.

We begin with an (arbitrary) equation from which we generate noise free data. The
equation is:

y = 2 4x (11.8)

Figure 11.9 shows noise free data generates in the range x = 1 to x = 10 using the
Random Number Generator in Excel. In the C column there are normally distributed
numbers with mean equal to zero, and standard deviation equal to one. The values in
the D column also have a normal distribution, but the standard deviation of the
distribution at each x value depends on the magnitude of the y value in column B.
Specifically, the standard deviation,
i
is given by :

i
= 0.1 y
i
(11.9)

88

-150
-100
-50
0
50
100
150

10 20 30
x
y

-400
-300
-200
-100
0
100
200
300

10 20 30
x
y
87

A B C D E
1 x y
noise_free
homo_noise hetero_noise y

2 1 -2 -1.0787 =0.1*B2*C2 =B2+D2
3 2 -6 -0.5726
4 3 -10 1.15598
5 4 -14 -0.0725
6 5 -18 0.67552
7 6 -22 0.44338
8 7 -26 -0.5806
9 8 -30 1.23376
10 9 -34 0.18546
11 10 -38 2.34588
Sheet 11.1: Generating data with heteroscedastic noise. Experimental data appear in
column E.

A B C D E
1 x y
noise_free
homo_noise hetero_noise y

2 1 -2 -1.0787 0.21575 -1.78425
3 2 -6 -0.5726 0.34356 -5.65644
4 3 -10 1.15598 -1.15598 -11.156
5 4 -14 -0.0725 0.10146 -13.8985
6 5 -18 0.67552 -1.21594 -19.2159
7 6 -22 0.44338 -0.97544 -22.9754
8 7 -26 -0.5806 1.50944 -24.4906
9 8 -30 1.23376 -3.70128 -33.7013
10 9 -34 0.18546 -0.63055 -34.6306
11 10 -38 2.34588 -8.91434 -46.9143
Sheet 11.2: Completed spreadsheet based on values in sheet 11.1.

Figure 11.9 shows a plot of y versus x based on the generated data in sheet 11.2. The
line of best fit on the graph was found using Excels Trendline option and therefore
represents and unweighted fit of the equation, y = a + bx to the data. Figure 11.10
shows the (unweighted) y residuals plotted versus x. The trend in the residuals
indicates that the errors have a heteroscedastic distribution and therefore weighted
fitting is required.

Figure 11.9: Plot of x y data as generated by sheet 11.1. The line of best fit (found
using unweighted least squares) is shown on the graph.
y = -4.5894x + 3.7994
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
2 4 6 8 10 12
x
y
88

Figure 11.10: Distribution of residuals when an unweighted fit is carried out.

In order to compare unweighted and weighted fitting of data to heteroscedastic data,
fifty sets of heteroscedastic data were generated in the manner described above. An
equation of the form:

y = a + bx (11.10)

was fitted to simulated data.

Unweighted fitting was performed using the LINEST() function in Excel. Weighted
fitting was performed with the aid of Solver
89
, where the weighting was chosen so that
the standard deviation in the ith value was taken to be proportional to y
i
, i.e.,

i i
y (11.11)

Figures 11.11 and 11.12 compares the scatter in the estimates parameters when
unweighted and weighted fitting is performed on data which is heteroscedastic. It is
clear from both figures that the weighted fit produces a much narrow distribution in
parameter estimates and is therefore preferred over the weighted fit .

89
Note that the equation being fitted is linear in the parameters and so fitting can be accomplished
using weighted linear least squares. However, as Excel does not possess an option that allows for easy
fitting in this manner, it is easier to construct a spreadsheet that minimises (using Solver) the sum the
residuals, SSR, where:
|
|
.
|
\
|
=
2
i
i i
y
y y
SSR .
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
2 4 6 8 10 12
x
r
e
s
i
d
u
a
l

89
0
5
10
15
20
25
30
35
-
1
.
5
-
0
.
5
0
.
5
1
.
5
2
.
5
3
.
5
4
.
5
parameter estimate, a
f
r
e
q
u
e
n
c
y

unweight
weight

Figure 11.11: Distribution of the parameter estimate, a, when unweighted and
weighted fitting is carried out on fifty data sets.

0
5
10
15
20
25
-
4
.
6
-
4
.
4
-
4
.
2
-
4
-
3
.
8
-
3
.
6
-
3
.
4
-
3
.
2
parameter estimate, b
f
r
e
q
u
e
n
c
y
unweight
weight

Figure 11.12: Distribution of the parameter estimate, b, when unweighted and
weighted fitting is carried out on fifty data sets.

Exercise 9
a) Use equation 11.8 to generate y values for x = 1, 2, 3 etc. up to x = 10.
b) Add normally distributed homoscedastic noise with mean of zero and standard
deviation of unity to the y values generated in part a).
c) Fit equation 11.10 to the data using both unweighted fitting and weighted least
squares. For the weighted fit, assume that the relationship for the standard
deviation in the y values given by equation 11.11 is valid.
d) Repeat part c) at least 40 times. Construct histograms of the scatter in both a
and b for both weighted and unweighted fitting.
e) Calculate the mean and standard deviation of a and b that you obtained in
part d).
f) Is unweighted fitting by least squares demonstrably better than weighted
fitting in this example?
90
Section 12: Review
This document focuses primarily on fitting equations to data using the technique of
non-linear least squares. In particular, the use of the Solver tool packaged with Excel
has been considered and how it may be employed for non-linear least squares fitting.
For completeness, some discussion of linear least squares has been included and under
what circumstances linear least squares is no longer viable.

A most important aspect of fitting equations to data is to be able to determine standard
errors in the estimates made of any parameters appearing in an equation. Solver does
not provide standard errors, so this document describes the means by which standard
errors can be calculated using an Excel spreadsheet. An advantage of employing
Excel is that some aspects of fitting by non-linear least squares which are normally
hidden from view when using a conventional computer based statistics package can
be made visible with Excel. I hope this leads to a deeper appreciation of non-linear
least squares than simply entering numbers into a stats package and waiting for the
fruits of the analysis to emerge.

Some general issues relating to fitting by non-linear least squares have been
discussed, such as the existence of local minima in SSR and means by which good
starting values may be established in advance of fitting.

We have also considered briefly how equations fitted to data can be compared in
order to determine which equation is the 'better' in a statistical sense, while at the
same time emphasising that any equation fitted to data should be supported on a
foundation of sound physical and/or chemical principles.

This document is not yet complete. I would like to include something in the future
about identifying and treating outliers as well as points of high leverage.

Acknowledgements
I would like to express my sincere thanks to Dr Mary Mulholland of the Faculty of
Science at UTS and Dr Paul Swift (formerly of the same Faculty) for suggesting
examples from chemistry and physics that may be usefully treated using non-linear
least squares. From Luton University I acknowledge the assistance and
encouragement of Professor David Rawson, Dr Barry Haggert and Dr John Dilleen. I
thank my good friends John Harbottle and Peter Rowley for their excellent hospitality
while I was in the UK in 2002 preparing some of this material.

I also acknowledge a timely communication from Dr Marcel Maeder of Newcastle
University (New South Wales) who queried the omission of Excels Solver from my
book. I am grateful to Dr Maeder, as his query provided the spur to create this
document.

Finally, I thank the following organisations where parts of this document were
prepared: University of Technology, Sydney, University of Paisley, UK, University of
Luton, UK, and CSIRO, Lindfield, Australia.

91
Problems
1.
Standard addition analysis is routinely used to establish the composition of a sample.
In order to establish the concentration of Fe
3+
in water, solutions containing known
concentrations of Fe
3+
were added to water samples
90
. The absorbance of each
solution, y, was determined for each concentration of added solution, x. The
absorbance/concentration data are shown in table P1.

Concentration
(ppm), x
Absorbance
(arbitrary units), y
0 0.240
5.55 0.437
11.10 0.621
16.65 0.809
22.20 1.009
Table P1: Data for problem 1.

The relationship between absorbance, y, and concentration, x, may be written,

y = B(x-x
C
) (P1)

Where B is the slope of the line of y versus x. x
C
is the intercept on the x axis which
represents the concentration of Fe
3+
in the water before additions are made.

Use non-linear least squares to fit equation P1 to the data in table P1. Determine,

a) best estimates of B and x
C
[0.03441 ppm
-1
, -7.009 ppm]
b) standard errors in B and x
C
[0.000277 ppm
-1
, 0.159 ppm].

2.
Another way to analyse the data in table P1 is to write,

y = A + Bx (P2)

Here A is the intercept on the y axis at x = 0, and B is the slope. The intercept on the x
axis, x
C
(found by setting y = 0 in equation P2) is given by,

B
A
x
C
= (P3)

Use linear least squares to fit equation P2 to the data in table P1. Determine,

a) best estimates of A, B and x
C
[0.2412, 0.03441 ppm
-1
, -7.009 ppm]
b) standard errors in the best estimates of A, B and x
C
[0.00376, 0.000277 ppm
-1
,
0.159 ppm].

.

90
This problem is adapted from Skoog and Leary (1992).
92
Note that the errors in the best estimate of slope and intercept in equation P2 are
correlated and so the normal propagation of uncertainties method is not valid when
calculating x
C
(see section 8.1).

3
In a study of first order kinetics, the volume of titrant required, V(t), to reach the end
point of a reaction is measured as a function of time, t. The following data were
obtained
91
.

t(s) V(t) (ml)
145 4.0
314 7.6
638 12.2
901 15.6
1228 18.6
1691 21.6
2163 24.0
2464 24.8
Table P2: Data for problem 3.

The relationship between V and t can be written,

V(t) = V

- ( V
-V
0
)exp(-kt) (P4)

Where k is the rate constant. V

and V
0
are also constants.

Using non-linear least squares, fit equation P4 to the data in table P2. Determine,

a) best estimates of V

, V
0
and k [28.22 ml, 0.9906 ml, 0.0008469 s
-1
]
b) standard errors in the estimates of V

, V
0
and k [0.377 ml, 0.216 ml,
3.00 10
-5
s
-1
]

91
These data were taken from Denton (2000).
93
4.
Table P3 contains data obtained from a simulation of a chemical reaction in which
noise of constant variance has been added to the data.
92

Time,
t, (s)
Concentration,
C, (mol/l)
0 0.01000
20000 0.00862
40000 0.00780
60000 0.00687
80000 0.00648
100000 0.00595
120000 0.00536
140000 0.00507
160000 0.00517
180000 0.00450
200000 0.00482
220000 0.00414
240000 0.00359
260000 0.00354
280000 0.00324
300000 0.00333
320000 0.00309
340000 0.00285
360000 0.00349
380000 0.00273
400000 0.00271
Table P3: Simulated data taken from Zielinski and Allendoerfer (1997).

Assuming that the relationship between Concentration, C, and time, t, can be
written
93
,

kt C
C
C
0
0
1+
= (P5)

C
0
is the concentration at t = 0 and k is the second order rate constant.

Fit equation P5 to the data in table P3 to obtain best estimates for C
0
and k and
standard errors in the best estimates. [0.009852 mol/l, 0.0006622 l/mols,
0.00167 mol/l, 1.98 10
-5
l/mols]

5.
Table P4 gives the temperature dependence of the energy gap of high purity
crystalline silicon. The variation of energy gap with temperature can be represented
by the equation,

92
See Zielinski and Allendoerfer (1997).
93
The assumption is made that a second order kinetics model can represent the reaction.
94
( ) ( )
T
T
E T E
g g
+
=
2
0 (P6)

T
(K)
E
g
(T)
(eV)
20 1.1696
40 1.1686
60 1.1675
80 1.1657
100 1.1639
120 1.1608
140 1.1579
160 1.1546
180 1.1513
200 1.1474
220 1.1436
240 1.1392
260 1.1346
280 1.1294
300 1.1247
320 1.1196
340 1.1141
360 1.1087
380 1.1028
400 1.0970
420 1.0908
440 1.0849
460 1.0786
480 1.0723
500 1.0660
520 1.0595
Table P4: Energy gap versus temperature data.

where E
g
(0) is the energy gap at absolute zero and and are constants.

Fit equation P6 to the data in table P4 to find best estimates of E
g
(0), and as well
as standard errors in the estimates. Use starting values, 1.1, 0.0004, and 600
respectively for estimates of E
g
(0), and . [1.170 eV, 0.0004832 K
-1
, 662 K,
7.8 10
-5
eV, 4.7 10
-6
K
-1
, 11 K]

95
6.
In an experiment to study phytoestrogens in Soya beans, an HPLC system was
calibrated using known concentrations of the phytoestrogen, biochanin. Table P5
contains data of the area under the chromatograph absorption peak as a function of
biochanin concentration.

Conc. (x)
(mg/l)
Area, (y)
(arbitrary units)
0.158 0.121342
0.158 0.121109
0.315 0.403550
0.315 0.415226
0.315 0.399678
0.631 1.839583
0.631 1.835114
0.631 1.835915
1.261 3.840554
1.261 3.846146
1.261 3.825760
2.522 8.523561
2.522 8.539992
2.522 8.485319
5.045 16.80701
5.045 16.69860
5.045 16.68172
10.09 34.06871
10.09 33.91678
10.09 33.70727
Table P5: HPLC data for biochanin.

A comparison is to be made of two equations fitted to the data in table P5. The
equations are,

y = A + Bx (P7)
and
y = A + Bx
C
(P8)

Assuming an unweighted fit is appropriate, fit equations P7 and P8 to the data in
table P5.

For each equation fitted to the data, calculate the,

a) best estimates of parameters [-0.4021, 3.404 (mg/l)
-1
, -0.5650,
3.581 (mg/l)
-0.979
, 0.9790]
b) standard errors in estimates [0.0575, 0.0127 (mg/l)
-1
, 0.0885,
0.0790 (mg/l)
-0.979
, 0.00903]
c) sum of squares of residuals (SSR) [0.6652, 0.5074]
d) Akaikes information criterion [-4.15, -7.57]
e) residuals. Draw a graph of residuals versus concentration.

Which equation better fits the data?

96

7.
The relationship between critical current, I
c
, and temperature, T, for a high
temperature superconductor can be written,

|
|
.
|
\
|
|
|
.
|
\
|
=
2
1
2
1
1 435 . 0 tanh 1 74 . 1
c
c
c
c
T
T
T
T
B
T
T
A I (P9)

Where A and B are constants and T
c
is the critical temperature of the superconductor.
For a high temperature superconductor with a T
c
equal to 90.1 K, the following data
for critical current and temperature were obtained:

T (K) I (mA)
5 5212
10 5373
15 5203
20 4987
25 4686
30 4594
35 4245
40 4091
45 3861
50 3785
55 3533
60 3199
65 2903
70 2611
75 2279
80 1831
85 1098
90 29
Table P6: Critical current versus temperature data for a high temperature
superconductor with critical temperature of 90.1 K.

Fit equation P9 to the data in table P6, to obtain best estimates for the parameters A
and B and standard errors in best estimates. [3199 mA, 11.7, 17.0 mA, 1.79]

8.
A sensor developed to measure the electrical conductivity of salt solutions is
calibrated using solutions of sodium chloride of known conductivity, . Table P7
contains data of signal output, V, of the sensor as a function of conductivity.

97
(mS/cm)
V (volts)
1.504 6.77
2.370 7.24
4.088 7.61
7.465 7.92
10.764 8.06
13.987 8.14
14.781 8.15
17.132 8.19
24.658 8.27
31.700 8.31
38.256 8.34
Table P7: Signal output from sensor as a function of electrical conductivity.

Assume that the relationship between V and is,

( ) [ ]
exp 1 + = k V V
s
(P10)

Where V
s
, k and are constants.

Use unweighted non-linear least squares to determine best estimates of the constants
and standard errors in the best estimates. [8.689 V, 1.460 V, 0.4281, 0.0190 V,
0.00740 V, 0.0108]

9.
In a study of the propagation of an electromagnetic wave through a porous solid, the
variation of relative permittivity,
r
, of solid was measured as a function of moisture
content,
w
(expressed as a fraction). Table P8 contains the data obtained in the
experiment
94
.
w

r

0.128 8.52
0.116 7.95
0.100 7.65
0.095 7.55
0.077 7.08
0.065 6.82
0.056 6.55
0.047 6.42
0.035 5.97
0.031 5.81
0.025 5.69
0.022 5.55
0.017 5.38
0.013 5.26
0.004 5.08
P8: Variation of relative permittivity with moisture content.

Assume the relationship between
r
and
w
can be written,

94
Francois Malan 2002 (private communication).
98

( ) ( )
m m m w w m w w r
+ + = 2
2
2
(P11)

where,

w
is relative permittivity of water
m
is the relative permittivity of the (dry) porous material

Use (unweighted) non-linear least squares to fit equation P11 to the data in table P8
and hence obtain best estimates of
w
and
m
and standard errors in the best estimates
[55.44, 1.83, 5.067, 0.043]

10.
Unweighted least squares requires the minimisation of SSR given by,

( )
=
2
i i
y y SSR (P12)

A technique sometimes adopted when optimising parameters in optical design
situations is to minimise S4R, where,

( )
=
4
4
i i
y y R S (P13)

Perform a Monte Carlo simulation to compare parameter estimates obtained when
equations P12 and P13 are used to fit an equation of the form, y = a + bx to simulated
data. More specifically,

a) Use the function y = 2.1 0.4x to generate y values for x =1, 2, 3 etc up to
x = 20.
b) Add normally distributed noise of mean equal to zero and standard deviation of
0.5 to the values generated in part a).
c) Find best estimates of a and b by minimising SSR and S4R as given by
equations P12 and P13. (Suggestions: Solver may be used minimise SSR and
S4R.
d) Repeat steps b) and c) until 50 sets of parameter estimates have been obtained
using equation P12 and P13.
e) Is there any significant difference between the parameter estimates obtained
when minimising SSR and S4R?
f) Is there any significant difference between the variance in the parameter
estimates when minimising SSR and S4R?

99
11.
In section 10.1, the relationship between free fall acceleration, g(h) and height, h, was
written:

( )
2
0
1 |
.
|
\
|
+
=
R
h
g
h g (P14)

To study the validity of equation P14, low noise data of free fall acceleration are
gathered over a range of values of height, h.

For h values small compared to the radius of the Earth, the acceleration will decrease
almost linearly with height. Applying the binomial expansion to equation 10.1, we
obtain for a first order approximation,

( ) |
.
|
\
|
=
R
h
g h g
2
1
0
(P15)

Contained in table P9 are data of the variation of acceleration with height above the
Earths surface.

h (km) g (m/s
2
)
1000 7.33
2000 5.68
3000 4.53
4000 3.70
5000 3.08
6000 2.60
7000 2.23
8000 1.93
9000 1.69
10000 1.49
Table P9: Variation of acceleration due to gravity with height.

a) Use least squares to fit both equations P14 and P15 to the data in table P9 and
determine best estimates for g
0
and R.
b) Calculate standard errors in the best estimates.
c) Calculate and plot the residuals for each equation fitted to the data in table P9.
d) Is equation P15 a reasonable approximation to equation P14 over the range of
h values in table P9?
12.
The electrical resistance, r, of a particular material at a temperature, T, may be
described by,

BT A r + = (P16)

or

2
T T r + + = (P17)

100
where A, B, , , and are constants.

Table P10 shows the variation of the resistance of an alloy with temperature.

Table P10: Resistance versus temperature data for an alloy
r () 19.5 18.4 20.2 20.1 20.9 20.8 21.2 21.8 21.9 23.6 23.2 23.9 23.2 24.1 24.2 26.3 25.5 26.1 26.3 27.1 28.0
T (K) 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350

Using (unweighted) linear least squares, fit both equation P16 and P17 to the data in
table P10 and determine for each equation,

a) estimates for the parameters [12.41 and 4.30 10
-2
/K, 14.0 ,
3.0 10
-2
/K and 2.7 10
-5
/K
2
]
b) the standard error in each estimate [0.49 and 0.19 10
-2
/K, 2.2 ,
1.8 10
-2
/K and 3.6 10
-5
/K
2
]
c) the standard deviation, , in each y value [0.5304 , 0.5368 ]
d) the sum of squares of the residuals, SSR [5.344
2
, 5.186
2
]
e) the Akaikes information criterion, AIC. [39.20, 40.57]
101
References
Akaike H A new look at the statistical model identification (1974) IEEE Transactions
on Automatic Control 19 716-723.

Al-Subaihi A A (2002) Variable Selection in Multivariate Regression using SAS/IML
http://www.jstatsoft.org/v07/i12/mv.pdf

Bard Y Nonlinear Parameter Estimation (1974) Academic Press, London.

Bates D M and Watts D G Nonlinear Regression Analysis and its Applications (1988)
Wiley, New York.

Bevington P R and Robinson D K Data Reduction and Error Analysis for the
Physical Sciences (1992) McGraw-Hill, New York.

Bube R H Photoconductivity of Solids (1960) Wiley New York.

Cleveland W S The Elements of Graphing Data (1994) Hobart Press, New Jersey.

Conway D G and Ragsdale C T Modeling Optimization Problems in the Unstructured
World of Spreadsheets (1997) Omega. Int. J. Mgmt. Sci. 25 313-322.

Demas J N Excited State Lifetime Measurements (1983) Academic Press, New York.

Denton P Analysis of First Order Kinetics Using Microsoft Excel Solver (2000)
Journal of Chemical Education 77, 1524-1525.

Dietrich C R Uncertainty, Calibration and Probability: Statistics of Scientific and
Industrial Measurement 2
nd
edition (1991) Adam Hilger, Bristol.

Frenkel R D Statistical Background to the ISO Guide to the Expression of
Uncertainty in Measurement (2002) CSIRO, Sydney p 43.

Fylstra D, Lasdon L, Watson J and Waren A Design and Use of Microsoft Excel
Solver (1998) Interfaces 28 29-55.

Karlovsky J Simple Method for calculating the Tunneling Current in an Esaki Diode
(1962) Phys. Rev. 127 419.

Katz E, Ogan K L and Scott R P W Peak Dispersion and Mobile Phase Velocity in
Liquid Chromatography: The Pertinent Relationship for Porous Silica (1983)
J. Chromatogr. 270 51-75.

Kennedy G J and Knox J H Performance of packings in high performance liquid
chromatography. 1. Porous and surface layer supports (1972) J. Chromatogr. Sci. 10
549-556.

Kirkup L Data Analysis with Excel: An Introduction for Physical Scientists (2002)
Cambridge University Press, Cambridge.

102
Kirkup L and Cherry I Temperature Dependence of Photoconductive Decay in
Sintered Cadmium Sulphide (1988) Eur. J. Phys. 9 64-68.

Kirkup L and Sutherland J Curve Stripping and Non-Linear Fitting of
Polyexponential Functions to Data using a Microcomputer (1988) Comp. in Phys. 2
64-68.

Moody H W The Evaluation of the Parameters in the Van Deemter Equation (1982)
Journal of Chemical Education 59, 290-291.

Neter J, Kutner M J, Nachtsheim C J and Wasserman W Applied Linear Regression
Models (1996) Times Mirror Higher Education Group Inc.

Nielsen-Kudsk F A Microcomputer Program in Basic for Iterative, Non-Linear Data-
Fitting to Pharmacokinetic Functions (1983) Int. J. Bio-Med. Comput. 14 95-107.

Nocedal J Numerical optimization (1999) Springer: New York.

Perry A A Modified Conjugate Gradient Algorithm (1978) Operation Research 26
1073-1078.

Safizadeh M and Signorile R Optimization of Simulation via Quasi-Newton Methods
(1994) ORSA J. Comput. 6 398-408.

Salter C Error Analysis using the Variance-Covariance Matrix (2000) Journal of
Chemical Education 77, 1239-1243.

Skoog D A and Leary J J Principles of Instrumental Analysis 4
th
edition (1992)
Harcourt Brace: Fort Worth.

Smith S and Lasdon L Solving Large Sparse Nonlinear Programs Using GRG (1992)
Journal on Computing 4 2-16.

Snyder L R, Kirkland, J J and Glajch J L Practical HPLC Method Development 2
nd

Edition (1997) Wiley: New York.

Walker J S Physics (2002) Prentice Hall, New Jersey.

Walkenbach J Excel 2002 Power Programming with VBA (2001) M&T Books, New
York.

Walsh S and Diamond D Non-linear Curve Fitting Using Microsoft Excel Solver
(1995) Talanta 42 561-572.

Williams I P Matrices for Scientists (1972) Hutchinson University Library, London.

Wolsey, L A Integer Programming (1998) Wiley, New York.

Zielinski T J and Allendoerfer R D Least Squares Fitting of Nonlinear Data in the
Undergraduate Laboratory (1997) Journal of Chemical Education 74 1001-1007.

Kirkup - Principles and Applications of Non-Linear Least Squares - An Introduction For Physical Scientists Using Excel Solver (2003)

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Kirkup - Principles and Applications of Non-Linear Least Squares - An Introduction For Physical Scientists Using Excel Solver (2003)

Diunggah oleh

Hak Cipta:

Format Tersedia

Principles and Applications of

Non-Linear Least Squares: An

A are diagonal elements of the A

, are approximated using,

is calculated by entering the formula

etc. this process is repeated for columns M

64 . 5252 8741 . 160 062 . 246

is the temperature at infinite time (which is equal to room temperature), T

follows a chi-squared distribution, hence the use of the symbol ,

, of the best line through the data on the temperature axis.

, assuming errors in a and b are uncorrelated.

, assuming errors in a and b are correlated.

when the correlation between a and b is accounted for, we

we use equation 8.12. It is convenient to rewrite equation 8.12 as,

Anda mungkin juga menyukai