Static Hand Out

1/15/2013
Lecture Note ;
Statistics for Analytical Chemistry (Chem 222)
Applications of Analytical Chemistry

Industrial Processes: analysis for quality control, and reverse engineering
(i.e. finding out what your competitors are doing).

Environmental Analysis: familiar to those who attended the second year Environmental Chemistry modules. A very wide range of problems and types of analyte
GIRMA SELALE
Recommended textbook:
Statistics for Analytical Chemistry J.C. Miller and J.N. Miller, Second Edition, 1992, Ellis Horwood Limited Fundamentals of Analytical Chemistry Skoog, West and Holler, 7th Ed., 1996 (Saunders College Publishing) 1/15/2013 1
Regulatory Agencies: dealing with many problems from first two.
Academic and Industrial Synthetic Chemistry: of great interest to many of my colleagues. I will not be dealing with this type of problem.
1/15/2013 2
The General Analytical Problem
Errors in Chemical Analysis

Select sample Extract analyte(s) from matrix Impossible to eliminate errors. How reliable are our data? Data of unknown quality are useless!
Separate analytes Detect, identify and quantify analytes Determine reliability and significance of results
1/15/2013 3 1/15/2013
Carry out replicate measurements Analyse accurately known standards Perform statistical tests on data
Mean
Defined as follows:
xi
x =
i=1
Illustration of Mean and Median

Results of 6 determinations of the Fe(III) content of a solution, known to contain 20 ppm(a standard solutions ):
Where xi = individual values of x and N = number of replicate measurements
Median
The middle result when data are arranged in order of size (for even numbers the mean of middle two). Median can be preferred when there is an outlier - one reading very different from rest. Median less affected by outlier than is mean. 1/15/2013 5
Note: The mean value is 19.78 ppm (i.e. 19.8ppm) - the median value is 19.7 ppm
1/15/2013
1/15/2013
Precision
Relates to reproducibility of results. How similar are values obtained in exactly the same way? Useful for measuring this: Deviation from the mean:
Accuracy
Measurement of agreement between experimental mean and true value (which may not be known!). Measures of accuracy: Absolute error: E = xi - xt (where xt = true or accepted value) Relative error:
x x t 100% E = i r x t
d i = xi x
1/15/2013 7
(latter is more useful in practice)

1/15/2013 8
Illustrating the difference between accuracy and precision Using a pattern of darts on a dartboards.
Some analytical data illustrating accuracy and precision This figure summarize the result for determining nitrogen in two pure compound
HN S H NH3 + ClH
Low accuracy, low precision
Low accuracy, high precision

Benzyl isothiourea hydrochloride
O OH N
High accuracy, low precision

1/15/2013
High accuracy, high precision

9
Analyst 4: imprecise, inaccurate Analyst 3: precise, inaccurate Analyst 2: imprecise, accurate 1/15/2013 Analyst 1: precise, accurate
Nicotinic acid
10
Types of Error in Experimental Data

Three types: (1) Random (indeterminate) Error Data scattered approx. symmetrically about a mean value. Affects precision - dealt with statistically (see later). (2) Systematic (determinate) Error Several possible sources - later. Readings all too high or too low. Affects accuracy. (3) Gross Errors Usually obvious - give outlier readings. Detectable by carrying out sufficient replicate 1/15/2013 measurements.
Sources of Systematic Error

1. Instrument Error Need frequent calibration - both for apparatus such as volumetric flasks, burettes etc., but also for electronic devices such as spectrometers. 2. Method Error Due to inadequacies in physical or chemical behaviour of reagents or reactions (e.g. slow or incomplete reactions) Example from earlier overhead - nicotinic acid does not react completely under normal Kjeldahl conditions for nitrogen determination. 3. Personal Error e.g. insensitivity to colour changes; tendency to estimate scale readings to improve precision; preconceived idea of true value.
1/15/2013 12
11
1/15/2013
Systematic errors can be

constant (e.g. error in burette reading less important for larger values of reading) or proportional (e.g. presence of given proportion of interfering impurity in sample; equally significant for all values of measurement) Minimise instrument errors by careful recalibration and good maintenance of equipment. Minimise personal errors by care and self-discipline Method errors - most difficult. True value may not be known. Three approaches to minimise: analysis of certified standards use 2 or more independent methods 1/15/2013analysis of blanks 13
Statistical Treatment of Random Errors

There are always a large number of small, random errors in making any measurement. These can be small changes in temperature or pressure; random responses of electronic detectors (noise) etc. Suppose there are 4 small random errors possible. Assume all are equally likely, and that each causes an error of U in the reading. Possible combinations of errors are shown on the next slide:
1/15/2013 14
Combination of Random Errors

Total Error +U+U+U+U -U+U+U+U +U-U+U+U +U+U-U+U +U+U+U-U -U-U+U+U -U+U-U+U -U+U+U-U +U-U-U+U +U-U+U-U +U+U-U-U +U-U-U-U -U+U-U-U -U-U+U-U -U-U-U+U -U-U-U-U +4U +2U No. 1 4 Relative Frequency 1/16 = 0.0625 4/16 = 0.250
Frequency Distribution for Measurements Containing Random Errors
6/16 = 0.375
4 random uncertainties
10 random uncertainties
-2U
4/16 = 0.250
A very large number of random uncertainties

-4U 1 1/16 = 0.01625
1/15/2013
The next overhead shows this in graphical form
This is a Gaussian or normal error curve. Symmetrical about the mean.

16
15
1/15/2013
Replicate Data on the Calibration of a 10ml Pipette

No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Calibration data in graphical form
Vol, ml.
9.988 9.973 9.986 9.980 9.975 9.982 9.986 9.982 9.981 9.990 9.980 9.989 9.978 9.971 9.982 9.983 9.988
No.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Vol, ml.
9.975 9.980 9.994 9.992 9.984 9.981 9.987 9.978 9.983 9.982 9.991 9.981 9.969 9.985 9.977 9.976 9.983
No.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Vol, ml
9.976 9.990 9.988 9.971 9.986 9.978 9.986 9.982 9.977 9.977 9.986 9.978 9.983 9.980 9.983 9.979
A = histogram of experimental results B = Gaussian curve with the same mean value, the same precision (see later) and the same area under the curve as for the histogram. 18
Mean volume Spread 1/15/2013
9.982 ml 0.025 ml
Median volume 9.982 ml Standard deviation 0.0056 ml
17
1/15/2013
1/15/2013
SAMPLE = finite number of observations POPULATION = total (infinite) number of observations

Properties of Gaussian curve defined in terms of population. Then see where modifications needed for small samples of data
: measure of precision of a population of data, given by:

N
( xi ) 2
=
i =1
Main properties of Gaussian curve:

Population mean () : defined as earlier (N ). In absence of systematic error, is the true value (maximum on Gaussian curve). Remember, sample mean (
Where = population mean; N is very large. The equation for a Gaussian curve is defined in terms of and , as follows:
2
x ) defined for small values of N.
(Sample mean population mean when N 20)
y=
19
e ( x )
/ 2 2
Population Standard Deviation () - defined on next overhead

1/15/2013
2
20
1/15/2013
Two Gaussian curves with two different standard deviations, A and B (=2A)
Area under a Gaussian Curve
From equation above, and illustrated by the previous curves, 68.3% of the data lie within of the mean (), i.e. 68.3% of the area under the curve lies between of . Similarly, 95.5% of the area lies between 2 and 99.7% 2, between 3 3. General Gaussian curve plotted in units of z, where z = (x - )/ i.e. deviation from the mean of a datum in units of standard deviation. Plot can be used for data with given value of mean, and any standard deviation.
21
There are 68.3 chances in 100 that for a single datum the random error in the measurement will not exceed . The chances are 95.5 in 100 that the error will not exceed 2 2.
1/15/2013 22
1/15/2013
Sample Standard Deviation, s

The equation for must be modified for small samples of data, i.e. small N
N
Alternative Expression for s (suitable for calculators)

N
( xi x ) 2
s=
i =1
N 1
s=
( xi 2 )
i =1
( xi ) 2
i =1
Two differences cf. to equation for : 1. 2. Use sample mean instead of population mean. Use degrees of freedom, N - 1, instead of N. Reason is that in working out the mean, the sum of the differences from the mean must be zero. If N - 1 values are known, the last value is defined. Thus only N - 1 degrees of freedom. For large values of N, used in calculating , N and N - 1 are effectively equal. 23
N 1
Note: NEVER round off figures before the end of the calculation
1/15/2013
1/15/2013
24
1/15/2013
Standard Deviation of a Sample
Reproducibility of a method for determining the % of selenium in foods. 9 measurements were made on a single batch of brown rice. xi2 0.0049 0.0049 0.0064 0.0049 0.0049 0.0064 0.0064 0.0081 0.0064 xi2= 0.0533
Standard Error of a Mean

The standard deviation relates to the probable error in a single measurement. If we take a series of N measurements, the probable error of the mean is less than the probable error of any one measurement.
Sample 1 2 3 4 5 6 7 8 9 xi Mean = xi/N= 0.077g/g
Selenium content (g/g) (xI) 0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08 = 0.69
The standard error of the mean, is defined as follows:
sm = s
(xi)2/N = 0.4761/9 = 0.0529
Standard deviation:
1/15/2013
s=
0.0533 0.0529 = 0.00707106 = 0.007 9 1

25 1/15/2013 26
Coefficient of variance = 9.2% Concentration = 0.077 0.007 g/g
Pooled Data
To achieve a value of s which is a good approximation to , i.e. N 20, it is sometimes necessary to pool data from a number of sets of measurements (all taken in the same way). Suppose that there are t small sets of data, comprising N1, N2,.Nt measurements. The equation for the resultant sample standard deviation is:
Pooled Standard Deviation
Analysis of 6 bottles of wine for residual sugar.
Bottle Sugar % (w/v) No. of obs. Deviations from mean 1 0.94 3 0.05, 0.10, 0.08 2 1.08 4 0.06, 0.05, 0.09, 0.06 3 1.20 5 0.05, 0.12, 0.07, 0.00, 0.08 4 0.67 4 0.05, 0.10, 0.06, 0.09 5 0.83 3 0.07, 0.09, 0.10 6 0.76 4 0.06, 0.12, 0.04, 0.03
N1
N2 2 2 i =1
N3 2 i =1
( xi x1 )
s pooled =
i =1
+ ( xi x2 ) + ( xi x3 ) +.... N 1 + N 2 + N 3 +......t
( 0.05) 2 + ( 0.10) 2 + ( 0.08) 2 0.0189 = = 0.0972 = 0.097 2 2 and similarly for all sn . s1 =
Set n ( x x ) 1 0.0189 2 0.0178 3 0.0282 4 0.0242 5 0.0230 6 0.0205 Total 0.1326 1/15/2013
2 i
(Note: one degree of freedom is lost for each set of data)

1/15/2013 27
sn 0.097 0.077 0.084 0.090 0.107 0.083
s pooled =
0.1326 = 0.088% 23 6
28
Two alternative methods for measuring the precision of a set of results:
VARIANCE:
This is the square of the standard deviation:

N
( xi2 x ) 2
s =
2 i =1
N 1
Use of Statistics in Data Evaluation
COEFFICIENT OF VARIANCE (CV) (or RELATIVE STANDARD DEVIATION):

Divide the standard deviation by the mean value and express as a percentage:
s CV = ( ) 100% x
1/15/2013 29 1/15/2013 30
1/15/2013
Define some terms:

How can we relate the observed mean valuex( ) to the true mean ( )? The latter can never be known exactly.
CONFIDENCE LIMITS
interval around the mean that probably contains .
The range of uncertainty depends how closely s corresponds to .
CONFIDENCE INTERVAL
the magnitude of the confidence limits
x We can calculate the limits (above and below) around that must lie,
with a given degree of probability.
CONFIDENCE LEVEL
fixes the level of probability that the mean is within the confidence limits Examples later. First assume that the known s is a good approximation to .
32
1/15/2013
31
1/15/2013
Percentages of area under Gaussian curves between certain limits of z (= x - /)
50% 80% 90% 95% 99%
of area lies between
0.67 1.29 1.64 1.96 2.58
Values of z for determining Confidence Limits

Confidence level, % z
What this means, for example, is that 80 times out of 100 the true mean will lie between 1.29 of any measurement we make. Thus, at a confidence level of 80%, the confidence limits are 1.29.
For a single measurement: CL for = x z (values of z on next overhead) For the sample mean of N measurements ( x ), the equivalent expression is:
50 68 80 90 95 96 99 99.7 99.9
Note:
33 1/15/2013
0.67 1.0 1.29 1.64 1.96 2.00 2.58 3.00 3.29
CL for = x z
1/15/2013
these figures assume that an excellent approximation to the real standard deviation is known.
34
Confidence Limits when is known

Atomic absorption analysis for copper concentration in aircraft engine oil gave a value of 8.53 g Cu/ml. Pooled results of many analyses showed s = 0.32 g Cu/ml.Calculate 90% and 99% confidence limits if the above result were based on (a) 1, (b) 4, (c) 16 measurements.
If we have no information on , and only have a value for s the confidence interval is larger, i.e. there is a greater uncertainty. Instead of z, it is necessary to use the parameter t, defined as follows:
(a)
(1.64)(0.32) = 8.53 0.52g / ml 1 i.e. 8.5 0.5g / ml 90% CL = 8.53 (2.58)(0.32) = 8.53 0.83g / ml 1 i.e. 8.5 0.8g / ml 99% CL = 8.53
90% CL = 8.53
(b)
(164)( 0.32) . = 8.53 0.26g / ml 4 i.e. 8.5 0.3g / ml
t = (x - )/s i.e. just like z, but using s instead of .
(2.58)(0.32 ) = 8.53 0.41g / ml 4 i.e. 8.5 0.4 g / ml 99% CL = 8.53

(164)( 0.32 ) . 16 = 8.53 013g / ml .
By analogy we have:
CL for = x ts
N (where x = sample mean for N measurements)
90% CL = 8.53
(c)
i.e. 8.5 01g / ml . (2.58)(0.32) 99% CL = 853 . = 853 0.21g / ml . 16 i.e. 8.5 0.2 g / ml
The calculated values of t are given on the next overhead
1/15/2013
35
1/15/2013
36
1/15/2013
Values of t for various levels of probability

Degrees of freedom (N-1) 1 2 3 4 5 6 7 8 9 19 59 Note:
1/15/2013
Confidence Limits where is not known

Analysis of an insecticide gave the following values for % of the chemical lindane: 7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90% confidence level.
80% 3.08 1.89 1.64 1.53 1.48 1.44 1.42 1.40 1.38 1.33 1.30 1.29
90% 6.31 2.92 2.35 2.13 2.02 1.94 1.90 1.86 1.83 1.73 1.67 1.64
95% 12.7 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.10 2.00 1.96
99% 63.7 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 2.88 2.66 2.58
xi% 7.47 6.98 7.27
xi2 55.8009 48.7204 52.8529
xi = 21.72
xi2 = 157.3742
x=
x
N
2172 . = 7.24 3
(2.92)(0.25) 3
s=
( xi ) 2 (2172) 2 . 157.3742 N 3 = N 1 2 = 0.246 = 0.25%
2 i
90% CL = x ts
= 7.24 N = 7.24 0.42%
(1) (2)
As (N-1) , so t z For all values of (N-1) < , t > z, I.e. greater uncertainty
37
If repeated analyses showed that s = 0.28%:

1/15/2013
90% CL = x z
= 7.24 N = 7.24 0.27%
(164)( 0.28) . 3
38
Testing a Hypothesis
Carry out measurements on an accurately known standard. Experimental value is different from the true value. Is the difference due to a systematic error (bias) in the method - or simply to random error?
Bias = B- A = B - xt.
Test for bias by comparing x x t with the difference caused by random error
Remember confidence limit for (assumed to be xt, i.e. assume no bias) is given by:
Assume that there is no bias (NULL HYPOTHESIS), and calculate the probability that the experimental error is due to random errors. Figure shows (A) the curve for the true value ( A = t) and (B) the experimental curve ( B)
N at desired confidence level, random errors can lead to: x xt = ts N ts
CL for = x
ts
if x xt >
, then at the desired N confidence level bias (systematic error) is likely (and vice versa).
40
1/15/2013
39
1/15/2013
Detection of Systematic Error (Bias)

A standard material known to contain 38.9% Hg was analysed by atomic absorption spectroscopy. The results were 38.9%, 37.4% and 37.1%. At the 95% confidence level, is there any evidence for a systematic error in the method?
Are two sets of measurements significantly different?

x = 37.8% x x t = 11% .
xi = 113.4
s =
xi2 = 4208.30
Suppose two samples are analysed under identical conditions. Sample 1 x1 from N 1 replicate analyses Sample 2 x2 from N 2 replicate analyses Are these significantly different? Using definition of pooled standard deviation, the equation on the last overhead can be re-arranged:
4208.30 (113.4) 2 3 = 0.943% 2
Assume null hypothesis (no bias). Only reject this if
x xt > ts
But t (from Table) = 4.30, s (calc. above) = 0.943% and N = 3
x1 x2 = ts pooled
ts
N = 4.30 0.943 N
3 = 2.342%
N1 + N 2 N1 N 2
x xt < ts
Only if the difference between the two samples is greater than the term on the right-hand side can we assume a real difference between the samples.
1/15/2013
Therefore the null hypothesis is maintained, and there is no evidence for systematic error at the 95% confidence level.
41
1/15/2013
42
1/15/2013
Test for significant difference between two sets of data
Detection of Gross Errors
Two different methods for the analysis of boron in plant samples gave the following results (g/g): (spectrophotometry) (fluorimetry) Each based on 5 replicate measurements. At the 99% confidence level, are the mean values significantly different? Calculate spooled = 0.267. There are 8 degrees of freedom, therefore (Table) t = 3.36 (99% level). Level for rejecting null hypothesis is
A set of results may contain an outlying result - out of line with the others. Should it be retained or rejected? There is no universal criterion for deciding this. One rule that can give guidance is the Q test. Consider a set of results
ts
N 1 + N 2 N 1 N 2 - i. e . ( 3.3 6 )( 0.2 67 ) 1 0 2 5
i.e. 0.5674, or 0.57 g/g. The parameter Qexp is defined as follows:
1
B u t x
= 2 8 .0 2 6 . 2 5 = 1 . 7 5 g / g
p o o le d
i. e . x1 x 2 > ts
+ N
Qexp = xq xn /w where xq = questionable result xn = nearest neighbour w = spread of entire set

1/15/2013 44
1/15/2013
Therefore, at this confidence level, there is a significant difference, and there must be a systematic error in at least one of the methods of analysis. 43
Qexp is then compared to a set of values Qcrit: Qcrit (reject if Qexpt > Qcrit) No. of observations 90% 95% 99% confidencelevel
Q Test for Rejection of Outliers
The following values were obtained for the concentration of nitrite ions in a sample of river water: 0.403, 0.410, 0.401, 0.380 mg/l. Should the last reading be rejected?
Q e x p = 0 .3 8 0 0 .4 0 1 ( 0 .4 1 0 0 .3 8 0 ) = 0 .7
But Qcrit = 0.829 (at 95% level) for 4 values Therefore, Qexp < Qcrit, and we cannot reject the suspect value. Suppose 3 further measurements taken, giving total values of: 0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.411 mg/l. Should 0.380 still be retained?
3 0.941 4 0.765 5 0.642 6 0.560 7 0.507 8 0.468 9 0.437 10 0.412 Rejection of outlier recommended if Qexp
Note:1.
0.970 0.994 0.829 0.926 0.710 0.821 0.625 0.740 0.568 0.680 0.526 0.634 0.493 0.598 0.466 0.568 > Qcrit for the desired confidence level.
Q e x p = 0 .3 8 0 0 . 4 0 0 ( 0 .4 1 3 0 .3 8 0 ) = 0 . 6 0 6
But Qcrit = 0.568 (at 95% level) for 7 values Therefore, Qexp > Qcrit, and rejection of 0.380 is recommended. But note that 5 times in 100 it will be wrong to reject this suspect value! Also note that if 0.380 is retained, s = 0.011 mg/l, but if it is rejected, s = 0.0056 mg/l, i.e. precision appears to be twice as good, just by rejecting one value. 1/15/2013
The higher the confidence level, the less likely is rejection to be recommended. 2. Rejection of outliers can have a marked effect on mean and standard deviation, esp. when there are only a few data points. Always try to obtain more data. 3. If outliers are to be retained, it is often better to report 1/15/2013 median value rather than the mean. the
45
46

Static Hand Out

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Static Hand Out

Diunggah oleh

Hak Cipta:

Format Tersedia

1/15/2013

Applications of Analytical Chemistry

(i.e. finding out what your competitors are doing).

Regulatory Agencies: dealing with many problems from first two.

The General Analytical Problem

Errors in Chemical Analysis

Illustration of Mean and Median

Where xi = individual values of x and N = number of replicate measurements

(latter is more useful in practice)

Low accuracy, low precision

Low accuracy, high precision

High accuracy, low precision

High accuracy, high precision

Types of Error in Experimental Data

Sources of Systematic Error

Systematic errors can be

Statistical Treatment of Random Errors

Combination of Random Errors

Frequency Distribution for Measurements Containing Random Errors

A very large number of random uncertainties

The next overhead shows this in graphical form

This is a Gaussian or normal error curve. Symmetrical about the mean.

Replicate Data on the Calibration of a 10ml Pipette

Calibration data in graphical form

Mean volume Spread 1/15/2013

Median volume 9.982 ml Standard deviation 0.0056 ml

SAMPLE = finite number of observations POPULATION = total (infinite) number of observations

: measure of precision of a population of data, given by:

Main properties of Gaussian curve:

x ) defined for small values of N.

(Sample mean population mean when N 20)

Population Standard Deviation () - defined on next overhead

Area under a Gaussian Curve

Sample Standard Deviation, s

Alternative Expression for s (suitable for calculators)

Standard Deviation of a Sample

Standard Error of a Mean

Sample 1 2 3 4 5 6 7 8 9 xi Mean = xi/N= 0.077g/g

The standard error of the mean, is defined as follows:

(xi)2/N = 0.4761/9 = 0.0529

0.0533 0.0529 = 0.00707106 = 0.007 9 1

Coefficient of variance = 9.2% Concentration = 0.077 0.007 g/g

Pooled Standard Deviation

Analysis of 6 bottles of wine for residual sugar.

(Note: one degree of freedom is lost for each set of data)

sn 0.097 0.077 0.084 0.090 0.107 0.083

Two alternative methods for measuring the precision of a set of results:

This is the square of the standard deviation:

Use of Statistics in Data Evaluation

COEFFICIENT OF VARIANCE (CV) (or RELATIVE STANDARD DEVIATION):

Define some terms:

The range of uncertainty depends how closely s corresponds to .

Percentages of area under Gaussian curves between certain limits of z (= x - /)

50% 80% 90% 95% 99%

of area lies between

0.67 1.29 1.64 1.96 2.58

Values of z for determining Confidence Limits

0.67 1.0 1.29 1.64 1.96 2.00 2.58 3.00 3.29

Confidence Limits when is known

t = (x - )/s i.e. just like z, but using s instead of .

(2.58)(0.32 ) = 8.53 0.41g / ml 4 i.e. 8.5 0.4 g / ml 99% CL = 8.53

N (where x = sample mean for N measurements)

The calculated values of t are given on the next overhead