0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)

0 tayangan6 halamanir dan ms

May 14, 2018

Liang 2001

© © All Rights Reserved

PDF, TXT atau baca online dari Scribd

ir dan ms

© All Rights Reserved

0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)

0 tayangan6 halamanLiang 2001

ir dan ms

© All Rights Reserved

Anda di halaman 1dari 6

I. Isotope distribution and Beynon table

Yizeng Liang a,∗ , Feng Gan b

a College of Chemistry and Chemical Engineering, Institute of Chemometrics and Intelligent Analytical Instruments,

Central South University, Changsha 410083, PR China

b College of Chemistry and Chemical Engineering, Institute of Chemometrics and Chemical Sensing Technology,

Received 7 November 2000; received in revised form 4 April 2001; accepted 11 April 2001

Abstract

A primary approach on data mining (DM) of mass spectral library has been developed in this work. The results obtained

from DM for the first time showed that the ratios of isotope peaks and molecular ion peak obey a logarithm normal distribution.

With the help of statistical inference, the guideline about how to efficiently use Beynon table is also given in order to provide

information for making a decision on the molecular weight or molecular formula in mass spectral analysis from the point of

view of statistics. The work indicates that a wide studying on DM of chemical databases is essential not only for the verification

of the existing knowledge, but also for the discovery of new laws in chemistry. © 2001 Elsevier Science B.V. All rights reserved.

Keywords: Data mining; Beynon table; Statistical inference; Logarithm normal distribution; Chemometrics

on molecular structures and their properties. Could

Chemistry is essentially so far a strongly experience- we directly obtain some useful knowledge from the

dependent scientific discipline. Lots of chemists do rapidly growing volumes of chemical data? Modern

lots of chemical experiments and chemical measure- microcomputer that makes the digitized information

ment everyday. The major body of knowledge of easy to capture and fairly inexpensive to store and to

chemistry obtained so far is mostly based on chemi- access, which gives us chemists a great opportunity

cal experiments and measurement data, even theoretic to extract useful chemical knowledge in databases or

quantum chemistry could explain something in chem- the huge amount of the chemical data. This is also

istry. However, with the development of the chem- just the object of new techniques, named knowledge

istry and information sciences, one new challenge for discovery in databases (KDD) or data mining (DM),

chemists is the spectacular growth of measurement developed quickly in computer science, since the

data that contain a large amount of chemical com- DM aims to discover something new from the facts

pound information, such as the spectral databases, recorded in the databases or huge amount of data col-

lected [1–4]. Recently, Buydens and her colleagues

∗ Corresponding author. Tel.: +86-731-882-2841;

give such an example for this [5].

fax: +86-731-882-5637. As several large-scale databases of mass spectra

E-mail address: yzliang@cs.hn.cn (Y. Liang). are now available, it is possible for us to enlarge and

0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.

PII: S 0 0 0 3 - 2 6 7 0 ( 0 1 ) 0 1 0 7 3 - X

116 Y. Liang, F. Gan / Analytica Chimica Acta 446 (2001) 115–120

deepen our knowledge extent in this field. On the other Equipment: GC-17A Gas Chromatograph, QP-5000

hand, the high-speed computer and statistical meth- Mass Spectrometer, Shimadzu.

ods will make the approach easier and fruitful. The Detection conditions: An OV-17 capillary column

aim of this paper is to show how the DM technique (30 m × 0.25 mm i.d.) is used. Column temperature is

could work out some useful chemical knowledge in maintained at 60◦ C for 2 min, programmed from 60

the databases. First, a comprehensive statistic investi- to 270◦ C at a rate of 20◦ C/min. Inlet temperature is

gation was conducted, then a primary approach on DM kept at 250◦ C. Helium carrier gas is used at a constant

of mass spectral library has also been developed based flow-rate of 1 ml/min.

on this information. The rather astonishing results ob- Mass spectrometer: Electron impact (EI+ ) mass

tained from DM for the first time showed that the ra- spectra are recorded at 70 ev ionization energy in

tios of isotope peaks and molecular ion peak obey a full scan mode in the 40–426 amu mass range with

logarithm normal distribution. With the help of statis- 0.2 s/scan velocity. The ionization source temperature

tical inference, the guideline about how to efficiently is set at 230◦ C. Detected spectra are identified by

use Beynon table is also given in order to provide matching EI+ against the national institute of stan-

information for making a decision on the molecular dards and technology (NIST) MS database containing

weight or molecular formula in mass spectral analysis about 62,000 compounds.

from the point of view of statistics. The work indicates All programs are written in MATLAB 5.0 and run

that a broad studying on DM of chemical databases is on a PC (CPU 200, RAM128MB).

essential not only for the verification of the existing

knowledge, but also for the discovery of new rules in

chemistry. 3. Results and discussion

2. Experimental per, is very important in mass spectral analysis, since

with the help of the molecular ion peak, people could

A mass spectral library was established by trans- easily obtain some information on molecular weight as

ferring NIST62 mass spectrum library, which is built well as molecular formula. Beynon table is a common

in the GCMS-QP5000 of Shimadzu. The data of our tool for checking the molecular ion peak and molec-

library are stored in the format of binary file and ular formula in mass spectral analysis, in which the

MATLAB data file. The number of the mass spectra ratios of (M + 1)/M and/or (M + 2)/M are given as

included in the library is 61999. constants according to the isotopic abundance of each

The spectra of compounds containing (C, H), (C, element in nature. Thus, one could identify if a peak

H, O), (C, H, N) or (C, H, O, N) in the NIST62 mass is really a molecular ion peak by comparing the calcu-

spectrum library are all collected, respectively. Ratios lated abundance ratio of (M + 1)/M and/or (M + 2)/M

of the isotopic peaks with the corresponding molecular with the one listed in the Beynon table. Thus, most

ion peaks are then calculated in this work, but only of the text books on mass spectral analysis will be at-

the results of (M + 1)/M are presented in this paper. tached a Beynon table for the convenience of users

Thus, an elaborate program was encoded in Matlab to usage [7,8]. However, when we worked with the table,

collect the molecular ion peaks denoted by (M) and the we found that the consistent rate between the measured

corresponding isotopic peaks, denoted by (M + 1) and abundance ratio of (M + 1)/M and the value listed in

(M + 2), respectively, and then to calculate the ratios Beynon table is rather lower. (Table 1). Is there some-

of (M + 1)/M and/or (M + 2)/M. The true values of thing wrong for the Beynon table or something hidden

the ratios of the isotopic peaks with the corresponding in the facts?

molecular ion peaks are obtained by using those listed With the help of the NIST62 mass spectral database,

in the book of Beynon [6]. in which 62199 mass spectra were collected, a broad

The mass spectrum of hexane was measured. The survey of the molecular ion peaks and their corre-

chemical, say hexane, is of analytical grade. The ex- sponding isotopic peaks, say (M + 1) and (M + 2)

periment condition of the GC–MS is the following peaks, was conducted. Totally, 18170 mass spectra, in

Y. Liang, F. Gan / Analytica Chimica Acta 446 (2001) 115–120 117

Table 1

Investigation results for consistent rate between the measured abundance ratios of isotopic peaks and molecular peaks with the ones listed

in Beynon table

Range of Number of Com. Part. con. Devia. (+10 Devia.(−10 Devia. Devia. Total rate

molecular samples con. with.b to +50%)c to −50%) (>+50%) (<−50%) of consis.d

weight with.a

0–50 35 1 14 6 8 3 3 42.86

51–70 130 5 49 22 32 19 3 41.54

71–90 385 24 103 71 67 91 29 32.99

91–110 756 58 277 109 145 118 49 44.31

111–130 1388 80 456 274 275 225 78 38.62

131–150 1979 110 808 348 398 201 114 46.39

151–170 1772 130 744 344 316 148 90 49.32

171–190 2244 110 900 402 450 228 154 45.01

191–210 1740 136 749 370 350 135 100 50.86

211–230 1361 119 586 213 271 106 66 51.80

231–250 1164 135 509 178 205 80 57 55.33

a Com. con. with.: completely consistence with the ones listed in Beynon table (deviation ≤ 5%).

b Part. con. with.: partially consistence with the ones listed in Beynon table (5% < deviation ≤ 10%).

c Devia. (+10 to +50%): deviations between the range of +10 to +50%.

d Total rate of consis.: total rate of consistence, including the ones of deviations within ±10%.

which 1681 molecular formulae are included, were in- might be misleading, since the complete consis-

vestigated. Except for 5216 mass spectra neither with- tent rate with the values in the Beynon table is

out molecular ion peaks or without (M + 1) peaks, the only around 5–10% (column 2 and column 3 in

abundance ratios of (M + 1)/M and/or (M + 2)/M of Table 1). Notice that here we say that the ratios cal-

12954 mass spectra are calculated. For the compounds culated are completely consistence with the one list

with the molecular formulae CW HX NY OZ , the calcu- in the Beynon table if and only if their deviations are

lation formula for the abundance ratio of M/(M + 1) within ±(5%) for both (M + 1) peaks and (M + 2)

listed in Beynon table can be as follows peaks. The total consistent rate only for (M + 1)

peaks is around 50% (last column in Table 1). Almost

Ratio(M+1), Beynon half the deviations fall into the range of ±(10–50%),

= (1.08W + 0.016X + 0.38Y + 0.04Z) (1) (column 5 and column 6 in Table 1), and some even

are beyond the range from −50 to +50%. The results

where Ratio(M+1),Beynon denotes the abundance ratio seem to suggest that the measured ratios of (M + 1)/M

of (M + 1)/M listed in Beynon table. The deviation may obey some statistical distribution.

of measured ratio of M/(M + 1) and the value listed In order to confirm this assumption, a comprehen-

in the Beynon table can then be estimated using the sive investigation over the whole database of mass

following equation spectra was then conducted. The molecules contain-

ing (C, H), (C, H, O), (C, H, N) or (C, H, O, N) were

Deviation % all investigated. The total number of the compounds

Ratio(M+1), measured − Ratio(M+1), Beynon in the database containing only elements carbon and

= 100 ×

Ratio(M+1), Beynon hydrogen is 4463. The results are shown in Fig. 1.

(2) From this plot, one can see that the distribution of

the abundance ratios of isotopic peaks with molecular

the results obtained are shown in Table 1. From peaks, say (M + 1)/M, looks like normal distribution

the table, one can easily see that simply using the (top part in Fig. 1) with some tailing. However, if

Beynon table to confirm the molecular ion peak we use the normal probability plot to check if it is

(or molecular weight) as well as molecular formula an approximately normal distribution, the answer is

118 Y. Liang, F. Gan / Analytica Chimica Acta 446 (2001) 115–120

Fig. 1. Statistical results for compounds embracing carbon and Fig. 2. Statistical results for compounds embracing carbon, hy-

hydrogen. Top part: the deviation distribution diagram of the mea- drogen and nitrogen. Top part: the deviation distribution diagram

sured abundance ratios of (M + 1)/M compared to the ones in of the measured abundance ratios of (M + 1)/M compared to the

Beynon table; low left part: the normal probability plot of the ones in Beynon table; low left part: the normal probability plot of

distribution; low right part: the normal probability plot of the dis- the distribution; low right part: the normal probability plot of the

tribution after logarithm transformation. distribution after logarithm transformation.

negative. The result is shown in Fig. 1 (plot in low left It is worthy noting that the ability of any method

part in Fig. 1). The observations can not be expressed to work with is strongly determined by the repro-

by a straight line in the normal probability plot. What ducibility of the intensities (abundance) in mass spec-

kind distribution will the abundance ratios of isotopic tra. It was found that the mass spectral intensities have

peaks with molecular peaks obey? Then, the loga- strong heteroscedastic noise [9]. Thus, the influence

rithm normal distribution was tried. The results are from the heteroscedastic noise on the intensities of

also shown in Fig. 1 (plot in lower right part in Fig. 1).

The results clearly show that they approximately

obey the logarithm normal distribution. In order to

further confirm this result, we continue to investigate

the compounds containing element carbon, hydrogen

and nitrogen (C, H, N) with the total number of 4310

molecules, the compounds containing element carbon,

hydrogen and oxygen (C, H, O) with the total number

of 9833 molecules, and the compounds containing

element carbon, hydrogen, nitrogen and oxygen (C,

H, O, N) with the total number of 10519 molecules.

The results are shown in Figs. 2–4, respectively. From

these figures, one can see that the abundance ratios

of isotopic peaks with molecular peaks obey really

the logarithm normal distribution. Especially, the re-

sults shown in Figs. 3 and 4 demonstrate the perfect

agreements between the assumption and facts. The Fig. 3. Statistical result for compounds embracing carbon, hydro-

gen and oxygen. Top part: the deviation distribution diagram of

reason for this might lie in that the total numbers of

the measured abundance ratios of (M + 1)/M compared to the

the compounds of (C, H, O) and (C, H, O, N) are ones in Beynon table; low left part: the normal probability plot of

much larger than the numbers of compounds of (C, the distribution; low right part: the normal probability plot of the

H) and (C, H, N), say 9833 and 10,519, respectively. distribution after logarithm transformation.

Y. Liang, F. Gan / Analytica Chimica Acta 446 (2001) 115–120 119

Fig. 4. Statistical results for compounds embracing carbon, hy- Fig. 5. Statistical results for compound hexane measured on

drogen, nitrogen and oxygen. Top part: the deviation distribution GC–MS. Top part: the deviation distribution diagram of the mea-

diagram of the measured abundance ratios of (M + 1)/M compared sured abundance ratios of (M+1)/M compared to the ones in

to the ones in Beynon table; low left part: the normal probability Beynon table; low left part: the normal probability plot of the

plot of the distribution; low right part: the normal probability plot distribution; low right part: the normal probability plot of the dis-

of the distribution after logarithm transformation. tribution after logarithm transformation.

mass spectra should be taken into account when one distribution. This suggests that DM based on efficient

uses ratio of M/(M + 1) to decide whether a m/e peak computer-calculation on large databases is quite nec-

is a molecular peak with the help of the Beynon ta- essary, since DM is useful not only for the verification

ble. Similarly, Grotch found several decades ago that of the existing knowledge, but also for the discovery

abundance values of ion fragments measured by mass of new knowledge in chemistry. On the other hand, the

spectroscopy closely follow a logarithm normal distri- databases of large quantity and high quality spectra,

bution [10]. This fact is consistent with the facts we including infrared, mass, NMR and UV-visible spec-

found in this work. In order to confirm this conclusion, tra, are available in recent years. How to efficiently

we further conducted some experiments on measuring use the chemical information accumulated from a long

some known compounds, i.e. hexane, on GC–MS to history of chemical researches will be a new challenge

check the idea. The result is shown in Fig. 5. From this for the chemists. We believe in that a comprehensive

plot, one can see that the logarithm normal distribu- DM upon mass spectral library will give a chance to

tion can also be seen even for one compound. This fact develop a new kind of expert system which must be

shows clearly that one can not simply use the ratio of more strong and effective.

(M + 1)/M as a constant as commonly used in chem-

istry. One has to consider it is really a random variable

obeying a logarithm normal distribution. Thus, in or- References

der to arrive at some conclusion, the statistic inference

[1] U. Fayyad, R. Uthurusamy, Commun. ACM 39 (1996) 27–34.

technique is necessary to conduct. [2] C. Glymour, D. Madigan, D. Pregibon, P. Smyth, Commun.

ACM 39 (1996) 35–41.

[3] W.H. Inmon, Commun. ACM 39 (1996) 49–50.

4. Conclusion [4] U. Fayyad, D. Haussler, P. Stolorz, Commun. ACM. 39 (1996)

51–57.

[5] L.M.C. Buydens, T.H. Reijmers, M.L.M. Beckers, R.

The results obtained in this work from DM shows Wehrens, Chemom. Intelli. Lab. Syst. 49 (1999) 121–133.

for the first time that the abundance ratios of isotope [6] J.H. Beynon, A.E. Williams, Mass and Abundance Tables for

peaks and molecular ion peak obey a logarithm normal Use in Mass Spectrometry, Elsevier, Amsterdam, 1963.

120 Y. Liang, F. Gan / Analytica Chimica Acta 446 (2001) 115–120

[7] J.R. Chapman, Computerized Mass Spectrometry, Academic [9] O.M. kvalheim, F. Brakstad, Y.Z. Liang, Anal. Chem. 66

Press, London, 1978. (1994) 43–51.

[8] Chen Yaozhu, Organic Analysis, Higher Education Publishing [10] S.L. Grotch, Am. Soc., Mass Spectrom. 34 (1969)

House, Beijing, 1983, pp. 694. 459–466.