November 1, 2010
Vladimir Bochko
Contents
Preprocessing Baseline correction. Autoscaling. Savitzky-Golay Smoothing Filters Savitzky-Golay Filters for Derivatives Smoothing Splines Splines for Derivatives
Vladimir Bochko
Preprocessing
Normalizing. It is done by dividing each feature realted to
a sample vector by a constant. The constant value may be dened as the 1-norm or 2-norm of the vector.
Weighting. For weighting any values are used. The
Smoothing lters.
Baseline correction. Reduce unwilling slow variation.
Estimation of baseline and subtruction from spectrum, Derivatives, Multiplicative Scatter Correction MSC.
Vladimir Bochko
Baseline correction
We consider the measured spectrum s. The baseline is
approximated using polynomial [3], for example up to linear term. + + x s=s where x is a wavelength number, denes a shift (offset) in spectrum and desribes a linear change in spectrum.
We are interested in s and the rest terms describe
unwilling variation.
If the baseline model has only shift then after estimating
Vladimir Bochko
Derivatives
If we have only shift s = s + we can use derivative with
Vladimir Bochko
Autoscaling
value by standard deviation. Standard deviation is dened using values of this feature.
Vladimir Bochko
corrupted by random noise. The beginning and the end of the spectra contain mostly noise. The purpose of smooting is to reduce the level of noise keeping spectrum details.
Savitzky-Golay lters were designed to analyze the
absorption peaks in noisy spectra. The absorption peaks suggest which chemical elements are presented in the tested materials.
Filters have several names Savitzky-Golay, least squares
low-pass lters.
Vladimir Bochko
fi =
n=nlef t
cn si+n
where nlef t and nright are the number of points to the left and right of a current point, respectively. cn are weight coefcients.
Vladimir Bochko
higher order. Least-squares tting is used to t the polynomial to the data inside the moving window. The use of the least square t in this way may be computationally demanding. However it is possible to nd the coefcients cn for which least square tting inside a moving window is automatically implemented.
Vladimir Bochko
Vladimir Bochko
underlying function. b) The same signal with additive Gausian random noise. a
1.8 1.6 1.4 1.4 1.2 1 0.8 0.6 0.4 0.4 0.2 0.2 0 1 2 3 4 5 6 0 0 1 2 3 4 5 6 1.2 1 0.8 0.6 2 1.8 1.6
Vladimir Bochko
lter, nlef t = nright = 15, polynomial order 2. The smoothed result is shown by green dots. a
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 a vs. x a vs. x (smooth) 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5
b
a vs. x a vs. x (smooth)
Vladimir Bochko
order 2. a) Savitzky-Golay lter, nlef t = nright = 30, polynomial order 2. The smoothed result is shown by green dots. a
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 a vs. x a vs. x (smooth) 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5
b
a vs. x a vs. x (smooth)
Vladimir Bochko
This technique may be used with noisy data. The polynomial and least square tting are used at each
point of spectra.
The measured derivative is the derivative of the tted
polynomial.
The Savitzky-Golay lters from libraries can be used in the
derivative mode.
Computation is made in the moving window.
Vladimir Bochko
Smoothing Splines
Splines and wavelets are the other popular techniques
used for smoothing. For example, the smoothing spline may t spectral data. For the next illustration the smoothing splines were used from the functional data analysis library [2]. a) The original spectrum of the gastrointestinal tract of the dog. b) The smoothed spectrum. a
300 250 200 Value 150 100 50 0 400 Value 300 250 200 150 100 50
500
900
1000
400
500
900
1000
Vladimir Bochko
To achieve robust computation a B-spline is used. After obtaining the analytical spline expansion of spectrum the rst and second derivatives can be exactly computed. a) The smoothed spectrum of the gastrointestinal tract of the dog. b) The rst derivative. a
20 300 15 250 Derivative value 200 Value 150 100 50 10 5 0 5 10 15 500 600 700 800 Wavelength, nm 900 1000 20 400 500 600 700 800 Wavelength, nm 900 1000
400
Vladimir Bochko
References
Press W. H. , S.A. Teukolsky S.A., W. T. Vetterling W. T., Flannery B. P.. Numerical Recipes in C++, Cambridge University Press, 2002. Ramsay J. O. and Silverman B. W., Functional Data Analysis. Springer-Verlag, New York, 1997 Beebe K., Pell R. J., Seasholtz M. B., Chemometrics: A Practical guide. John Wiley & Sons, Inc., 1998. SOLO toolbox: http://wiki.eigenvector.com/index.php?title=Main Page
Vladimir Bochko
Questions
Vladimir Bochko