Anda di halaman 1dari 7

Bearing defect identication based on acoustic

emission signals
Botond Cseke
Faculty of Science
Radboud University Nijmegen
Email: B.Cseke@science.ru.nl
Tom Heskes
Faculty of Science
Radboud University Nijmegen
Email: T.Heskes@science.ru.nl
AbstractIn this paper we classify seeded bearing defects
based on acoustic emission data. We use data from recordings
of the experiment carried out by Al-Ghamd and Mba [1].
The classication method is based on autoregression model
features and acoustic emission features such as root mean square,
maximum amplitude and kurtosis value. We use support vector
machines and k-nearest neighbor methods as classication tools.
Autoregression model features improve signicantly the results
obtained for acoustic emission features only.
I. INTRODUCTION
Acoustic emission signal analysis (AE) is a standard tool
for monitoring the health state of materials and therefore of
various mechanical equipments. Quoting from Ganji [2]
AE is the monitoring technique which analyses
elastic waves naturally generated above the human
hearing threshold (> 20 kHz). It is associated with
the range of phenomena which generate broadband
activity from the transient release of stored elastic
energy from localized sources. ...
AE has been proven to be useful for condition monitoring of
bearing states. Ganji [2], Ganji and Holsnijders [3] provide an
AEsignal feature based interpretation peak value, root of
mean squared values (RMS), kurtosis value, crest factor, form
factor, AE count of lubrication conditions, while Jamaludin
and Mba [4], [5] provide an autoregression parameter based
clustering of acoustic emission signatures in case of slowly
rolling bearings. Recently, Al-Ghamd and Mba [1] conducted
an experiment for detecting the presence and size of seeded
defects in radially loaded bearings. Their analysis was based
on measuring signal features like: RMS, kurtosis, maximum
amplitude. We will briey describe their experiment in section
III.
In this paper we use standard machine learning tools such
as support vector machines (SVM) and the k-nearest neighbor
(kNN) method to analyze and classify features extracted
from the AE signals recorded during the above mentioned
experiments.
Section II describes the feature extraction methods and the
machine learning tools we used. It has two parts: section II-A
presents AE signal features used in [2] and the autoregression
models (AR) while section II-B presents in brief the machine
learning tools and techniques employed in our analysis.
In section III we describe the dataset we worked with and
the experiment we conducted for classifying the AE signatures.
We end with a discussion and conclusion in sections IV and V.
II. FEATURES AND ALGORITHMS
This section is intended to give a brief description of the
framework in which we embedded the problem. We made use
of AE signal characteristics employed in [1][5] in order to
create a set of AE signal features which can be used in a
classication task. We give a brief description of AE signal
features, AR models and support vector machines. Readers
interested in the results should skip this section or read it
later, if needed.
A. Features
Acoustic emission signal features: In his report Ganji [2]
classies AE signals into 3 broad classes:
1. Burst activity: the signal has the form of a sequence of
transients and each of them can be roughly described
as an exponentially decaying sinusoidal. These bursts
may overlap and can have varying amplitudes and decay
factors. The most common method to detect the arrival
of bursts is to set a threshold value and check if and when
the signal value exceeds it.
2. Continuous activity: due to the high frequency of bursts
and the wide range of indistinguishable burst character-
istics (amplitude, decay factor) the signal has a random
oscillatory appearance.
3. Mixed mode activity: the burst activity is superimposed
on a continuous activity, meaning that some of the bursts
have distinguishable characteristics.
Because of the enormous amount and redundancy of data
that an AE sensor can provide, most of the monitoring tools
restrict themselves to the measurement of a few relevant
quantities. Empirical studies (see [2]) show that the most
important ones are:
peak value, maxima of signal at peaks
RMS value
kurtosis value, characterization of signal value distribu-
tion by 4th order statistics
crest factor, peak vale divided by RMS
form factor, RMS value divided by mean value
AE count, count of the burst events.
From these we have chosen to measure those that were also
measured in the experiment carried our by Al-Ghamd and Mba
(see [1]), i.e maximum amplitude or crest factor, root mean
square (we use the term power) and kurtosis value.
When dealing with time series data, one usually rst veries
whether or not the data can be modelled by autoregressive
(AR) processes. So it turned out that the AE signal recordings
of the Al-Ghamd and Mba experiment can be modelled by
AR processes of second order. We give a brief introduction to
AR processes and summarize a few important characteristics
to be used later in this paper.
AR models for Time Series modeling: An autoregressive
process of order p abbreviated by AR(p) on a discrete
time domain is dened by the linear model
y
t
=
p

j=1

t
y
tj
+
t
where
t
s are normally distributed and independent. Usually
t starts at 1 and we have to specify the rst p values of the
process or their distribution. In the following we work with
nite time domain i.e t runs trough {1, . . . , T}.
Using the notation y = (y
1:T
) and
t
N(0, s), we can
write the probabilistic model in the form
p(y|Y
p
, , s) = p(y
1:p
)
T

t=p+1
N(y
t
|
T
y
(t1):(tp)
, s)
where the parameters of the model are , s and the parameters
of the distribution for the rst p terms.
For better understanding we can rewrite the model in a
vectorized form
p(y|Y
p
, , s) exp
_

(y Y
T
p
)
T
(y Y
T
p
)
2s
_
where we have used the notation y = (y
p+1
, . . . , y
T
)
T
,
(Y
p
)
i,:
= y
(p+i1):i
, i = 1, . . . , T p and considered the
rst p terms given.
We can perform both Maximum Likelihood (ML) and
Bayesian estimation of the model parameters. The ML method
is equivalent to the least square estimation yielding the pa-
rameter estimates

= (Y
T
p
Y
p
)
1
Y
T
p
y and s =
1
Tp
(y
Y
p

)
T
(y Y
p

).
Bayesian estimation is usually performed with the so-called
reference or improper priors p(, s) 1/s. Calculating
p(, s|y, Y
p
) =
p(y|Y
p
, , s)p(, s)
p(y|Y
p
)
one obtains that the posterior marginal of is a multivariate
Studentt distribution with T 2p degrees of freedom
p(|y, Y
p
)
_
1 +
(

)
T
X
T
p
X
p
(

)
(T p) v
_
(Tp)/2
which for large T values is roughly N(|

, s(Y
T
p
Y
p
)
1
). For
a more detailed description of parameter estimation in AR
models the reader is referred to [6].
In the following we give a short characterization of the
AR(2) processes in terms of autoregression parameters based
on [6]. An AR(p) process is stationary if the autoregression
polynomial dened by
(u) = 1
p

j=1

j
u
j
has roots with moduli greater than unity in our case p = 2.
For simplicity by the term autoregression polynomial we will
refer to u
p
(1/u). It is easy to see that the roots of the former
and the latter are reciprocals of each other. The stationarity
condition translated to AR(2) coefcients is as follows: 2 <

1
< 2,
1
< 1
2
and
1
>
2
1. The roots can be: (1) two
real roots if
2
1
+4
2
0 or (2) a pair of complex conjugate
roots if
2
1
+ 4
2
< 0 (for an easy graphical representation
see gure 4). In the latter case the model behaves like an
exponentially damped cosine wave with phase and amplitude
characteristics varying in response to the noise
t
. In order to
have both stationarity and complex roots the condition 1 <

2
<
2
1
/4 must be satised. One may also verify that the
forecast function E [y
t+k
|y
1:t
] has the form A
k
cos(k + ),
where A and are the moduli and phase of these complex
conjugate roots; is a phase translation.
B. Algorithms
In this section we will show how the probabilistic model
and its parameters can be used to characterize time series
data. In order to be able to dene features related to the AR
model described in the previous section we present in brief
the classication tool we used during the data analysis.
Support Vector Machines: Support vector machines (SVM)
as classication tools have been widely used in machine
learning since the mid-nineties and their applications for
different types of problems are still active areas of research.
In the following we shall give a very brief description. For a
comprehensive tutorial interested readers are referred to [7].
SVMs come from an area of machine learning called
statistical learning theory (SLT). SLT classication deals with
the following task: given a set of data pairs {(x
i
, y
i
)}
n
i=1
with
x
i
-s belonging to some predened set X and y
i
{1, 1},
select a class of functions (from X to {1, 1}) and a function
from that class for which the error function dened by the sum
of misclassications and the complexity of the function class
is minimal. In general this procedure is done in two steps.
First we choose the class and then we choose the function
from that class which produces the smallest misclassication
error. Usually X is a Euclidean space and the function class
implemented by SVM is the class of linear separators i.e.
_
sign(w
T
x +b)|w X, b R
_
.
If the data is separable, the SVM chooses the linear sepa-
rator which produces the largest margin: it is equally close
to the convex hulls of the two sets or it has the smallest
average distance from the points. Otherwise, if the data is not
separable it optimizes both w.r.t. large margin and number of
misclassications.
Finding the optimal hyperplane resumes to a quadratic
convex optimization problem. Once the optimum is found the
function value for a new input point x

is given by
f(x

) = sign
_
n

i=1
y
i

i
x
T
i
x

_
(1)
where the
i
-s are dual optimal parameters of the problem.
In general a high percentage of
i
-s are zero, so the function
value can be calculated from the x
i
points corresponding to
non-zero
i
-s. These vectors are called support vectors.
Another important characteristic of the hyperplane opti-
mization problem is that both the optimization procedure and
the calculation of function values involve only the scalar
product between the elements of X, therefore instead of the
usual Euclidean scalar product one may use other non-linear
scalar product functions too. Theoretically, this corresponds
to mapping the points of X into another space through the
eigenfunctions of the new scalar product and do the linear
separation there. The procedure is often called kernel trick
and leads to non-linear separating functions: denoting the
above mentioned new scalar product by K(, ) we can rewrite
equation 1 as
f(x

) = sign
_
n

i=1
y
i

i
K(x
i
, x

)
_
.
Since the optimization is still carried out in X and the
only thing we need the data for is the calculation of the
pairwise scalar products, the algorithm is insensitive to the
dimensionality of the input space. Figure 1 visualizes two
SVM settings.
Fisher kernels for probabilistic models: It often happens
that the quality or size of the data does not allow us to use
it directly in SVM. Time series are a good example because
we often have sequences of different size or sequences that
are not aligned. We have a probability model for the inputs
and we would like to enhance the SVM using information
from this model. The SVM requires metric relations between
inputs, so our goal is to build such relations based on the
probability model. The rst thing that naturally pops up is the
difference in log-likelihood values, but this only tells us about
the relation between the samples and the distribution (or its
parameters). To be able to capture the relation between the
samples one has to use the gradient space of the distribution
w.r.t. the parameters. For a given sample x, the gradient of the
log-likelihood

log p(x|)( s(x; )) w.r.t. the parameters


tells us the direction and scale of change in parameter space
induced by x (in statistical literature this quantity is called
the Fisher score). Therefore, one may think that if for two
samples x and x

the gradients s(x; ) and s(x

; ) are close
to each other, then it means that they generate the same change
in parameters and they can be assumed similar with regard to
that parameter or probabilistic model. Now, taking into account
the set of probability models {p(x|)|} two issues have to be
considered: (1) the Newton direction F()
1
s(x; ) provides
Fig. 1. An example of a Linear SVM on a separable dataset (upper) and
radial basis SVM on a linearly non-separable dataset (lower). The 2 classes are
plotted by -s and -es, the solid curve represents the classication boundary
corespondig to the 0-level curve while the dashed curves represent the -1 and
1 level curves. Contours around the points are proportional to the values
of the point.
a theoretically better motivated measure of change in param-
eters; (2) the set of probability distributions parameterized by
has a local metric dened by F(). Here
F() = E

_

2

T
log p(x|)
_
is the Fisher information matrix of the model.
Following this line of arguments Jaakkola and Haussler [8]
propose the scalar product
K(x, x

) = s(x; )
T
F()
1
s(x

; )
and the easier to calculate substitute K(x, x

) =
s(x; )
T
s(x

; ). It is easy to see that these simplify to the


usage of features F()

1
2
s(x; ) and s(x; ) together with
the standard scalar product (from now on we will refer to
the former as Fisher features). For a detailed explanation the
reader is referred to [8].
With the aid of Fisher score and Fisher features we can
dene AR model based features to be used with SVMs. The
calculation of Fisher score and Fisher matrix for AR models
is presented in the appendix.
III. EXPERIMENTAL RESULTS
In this section we describe in a nutshell the dataset we were
working on and present the results of our analysis.
A. Description of experiments
Dataset: Our analysis is based on the dataset created by
A.M. Al-Ghamd and D.Mba [1]. In this paper the authors
investigate the relationship between AE signal RMS, ampli-
tude and kurtosis for a range of defect conditions like smooth
defects, point defects, line defects and rough defects.
The experiment was carried out on a Split Coper 01B40MEX
type 01C/40GR bearing with the following parameters: internal
bore diameter 40 mm, external diameter 84 mm, diameter of
roller 12 mm, diameter of roller centers 166 mm and number
of rollers 10. There were two measurement devices: an AE
sensor and a resonancy type accelerometer. For our analysis
we used only the AE signals.
For measuring AE signatures a piezoelectric AE sensor
(Physical Acoustic Corporation type WD) with operating
frequency range 100-1000 kHz was used. The sensor was
placed on the bearing housing and its pre-amplication was
set to 40dB. The signal output from the pre-amplier was
connected to a data-acquisition card which provided sampling
rate of 10 MHz with 16-bit precision. There were anti-aliasing
lters (100 kHz-1.2 MHz) built into the data-acquisition card.
The broadband piezoelectric transducer was differentially con-
nected to the pre-amplier. Sequences of 256000 data points
were recorded with sampling rates varying from 2 MHz to 8
MHz, depending on the experiment type. In each experiment
around 20 such sequences were recorded.
There were two test programs (1) AE source identication
and defects of varying severities: ve test conditions of varying
severities were simulated on the outer race of the test bearing;
the defects were positioned on the top-dead-center (2) defects
of varying sizes: a point defect was increased in length and
width in various ways.
In test program (1) there were 5 types of measurements as
follows:
(1) baseline defect-free operating conditions where the bear-
ing was operated with no defects;
(2) smooth defect with a surface discontinuity not inuencing
the average surface roughness;
(3) point defect of size 0.85 0.85 mm
2
(abbreviated from
now on by PD);
(4) line defect of size 5.6 1.2 mm
2
(abbreviated from now
on by LD);
(5) rough defect of size 17.5 0.9 mm
2
(abbreviated from
now on by RD).
There were 4 types of speed conditions: 600 rpm, 1000 rpm,
2000 rpm, 3000 rpm and 3 types of load conditions: 0.1 kN,
4.43 kN, 8,86 kN.
1.5
1.6
1.7
1.8
0.95
0.9
0.85
0.8
0.75
0
1
2
3
x 10
3

2
M
S
E
Fig. 2. A plot of the ML parameter estimates. The axes correspond to the

1
,

2
and s parameters. Circles, squares and triangles correspond to the PD,
LD and RD conditions.
Experiment design: Our analysis was carried out on the data
recorded from test program (1). We analyzed defect conditions
(3)-(5) and used only 10 data sequences of each defect, speed
and load conditions. Therefore we formulated it as a 3 class
classication problem with a dataset of 360 sequences of
length 256000 each.
According to the subsections of section II-A we calculated
a set of features from each sequence and we used them in the
subsequent analysis.
AR(2) models seemed to t well the data sequences we used
(see gure 2), therefore we calculated 3 sets of AR related
features. These were:
(1) the ML parameter estimations of each sequence;
(2) the Fisher scores of each data sequence based on the AR
model;
(3) the Fisher features of each data sequence based on the
AR model;
(4) the amplitude and period of complex conjugate roots of
the autoregression polynomial.
In addition we also extracted the AE related features such as:
(5) power or RMS;
(6) kurtosis;
(7) maximum amplitude.
See section II-A and I for more details about these quantities.
In gure 2 we see the plots of the ML parameter estimates
for each observation sequence. As we can see, the values of
the mean squared error (MSE) are reasonably small, and the
parameters
1
and
2
vary from 1.4 to 1.8 and from 0.75
to 0.95 respectively. According to the conditions in II-A
and as can be seen in gures 3 and 4, the measurements
are well approximated by stationary AR(2) processes and the
autoregressive polynomials have complex roots.
Figures 5 and 6 show the Fisher scores and the Fisher
features described in section II-B. It seems that these quantities
provide a better separation w.r.t. class attributes, but there is an
area of high concentration where all 3 classes overlap. This can
0.86 0.88 0.9 0.92 0.94 0.96
11
11.5
12
12.5
13
13.5
14
14.5
15
15.5
16
Amplitude (volts)
P
e
r
i
o
d

(
t
i
m
e

s
t
e
p
s
)
Fig. 3. Absolute value and wavelengths of the autoregressive polynomial
roots. Circles, squares and triangles correspond to the PD, LD and RD
conditions.
2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5
1
0.5
0
0.5
1

2
Fig. 4. Characterization of AR(2) processes. Processes with coefcients
below the solid line correspond to stationary processes while the ones within
the area separated by the dashed curve correspond to AR(2)-s with complex
roots. The patch on the gure represents the ML parametes estimates for
elements of the dataset in our consideration.
be due to the fact that all the scores and features are calculated
relative to the ML parameter estimates of the whole dataset.
We also measured the AE signal characteristics presented in
section II-A. The measurement results for the average signal
power are shown in gure 7. We observe that the signal power
increases both with defect severity and speed. For PD and RD
it also increases with the load, however for LD it seems to
show an interesting behavior: it peaks for the second load
condition.
The kurtosis values are plotted in gure 9. They peak
roughly at LD high speed and RD low speed and show slow
increase for PD and LD conditions and fast decay at RD
conditions. Its changes w.r.t. load conditions vary.
The measurements for the AE features are similar to the
ones presented in Al-Ghamd and Mba and therefore for more
5
0
5
10
x 10
5
5
0
5
x 10
5
10
8
6
4
2
0
x 10
5
dL/d
2 dL/d
1
d
L
/
d
v
Fig. 5. Fisher scores. Circles, squares and triangles correspond to the PD,
LD and RD conditions.
10
0
10
10
5
0
5
150
100
50
0

1
feature

2
feature
s

f
e
a
t
u
r
e
Fig. 6. Fisher features. Circles, squares and triangles correspond to the PD,
LD and RD conditions.
detailed explanations the reader is referred to [1].
B. Classication Results
Once the feature extraction part of the data analysis pro-
cedure was carried out, we used k-Nearest Neighbor (kNN)
and SVM methods to classify the data. Two types of SVMs
were used: (1) with linear scalar product (SVMlin), provid-
ing linear separation boundaries (2) with nonlinear scalar
product given by he radial basis function K(x, x

; ) =
exp(
1
2
2
||x x

||
2
) (abbreviated from now on by SVMrbf).
All these methods have some parameters to be tuned: the
parameter of kNN is the number of neighbors k and the
parameter of SVMlin is the percentage of allowed missclassi-
cations. SVMrbf has two parameters: percentage of allowed
missclassications and the scalar product parameter .
Since these methods are designed for dealing with 2 class
problems only, we employed the one against the rest classi-
cation method: used 3 different classiers of the same type to
separate one class from the others. Prediction for a new input
0 50 100 150 200 250 300 350
12
11
10
9
8
7
6
5
4
3
2
Samples
L
o
g

p
o
w
e
r

(
v
o
l
t
s
)
Fig. 7. Logarithm of the power (volts). The 360 examples are divided in
the following way: every 120 represent a defect condition (PD, LD and RD
in order) then within these every 30 represents a speed condition (600 rpm,
1000 rpm, 2000 rpm and 3000 rpm) then within these every 10 represent a
load condition (0.1 kN, 4.43 kN, and 8,86 kN). For example the 10 samples
with LD, 2000 rpm and 8,86 kN can be found between positions 200-210.
0 50 100 150 200 250 300 350
4
3
2
1
0
1
2
3
Samples
L
o
g

m
a
x
i
m
u
m

a
m
p
l
i
t
u
d
e

(
v
o
l
t
s
)
Fig. 8. Logarithm of maximum amplitude. Same sample identication
method applies like in the case of gure 7
is made by voting.
In order to test the methods we used 10 times 5-fold cross-
validation and analyzed the mean value of the classication
error. (The n-times cross-validation method is used both for
testing and model tting: we split the data set into n folds,
then we t the models parameters on the rst n1 ones and
test the models prediction performance on the nth one, we
repeat the procedure by circularly permuting the folds. The
procedure is nished when we have performed all n possible
cases and averaged the classication error.)
We have used 2 types of settings: (1) when features are
considered alone (2) the AR related and AE related features
are used together.
The results of the classication task are shown in table I.
0 50 100 150 200 250 300 350
1
2
3
4
5
6
7
Samples
L
o
g

k
u
r
t
o
s
i
s
Fig. 9. Logarithm of kurtosis. Same sample identication method applies
like in the case of gure 7
kNN SVMlin SVMrbf
log AE features 0.244 0.481 0.228
AR 0.180 0.285 0.168
AR roots 0.168 0.278 0.162
Fisher score 0.145 0.431 0.235
Fisher features 0.118 0.428 0.227
AR and log AE features 0.106 0.204 0.089
AR roots and log AE features 0.093 0.181 0.081
Fisher score and log AE features 0.173 0.266 0.175
Fisher features and log AE features 0.173 0.267 0.161
TABLE I
CLASSIFICATION ERRORS.
IV. DISCUSSION
As we can see in table I, AR model based features perform
better than the AE signal based features. When combined they
produce better results than separately. The best performances
are achieved with the AR coefcients and the amplitude
and period given by the complex conjugate roots of the
autoregression polynomial.
Overall, the plain AR parameters or their corresponding root
characteristics seem to yield better classication performance
than the Fisher scores and Fisher features. This may be
due to the fact that the AR parameters themselves are more
homogeneously distributed (compare gure 2 with gures
5 and 6), which makes it easier to separate them. kNNs
performance is less sensitive to inhomogeneity: it takes into
account the k nearest neighbors, no matter how far these are
apart. This might explain why Fisher scores and Fisher features
do much better for kNN than for SVMrbf. Apart from that,
the performance of kNN and SVMrbf is roughly the same.
For calculating the function value for a new input kNN
uses all the data in the dataset (of features), while SVM uses
only a fraction of them (the support vectors, see section II-B).
Because of its good performance but high cost kNN is only
used as a benchmark method. The SVM results are considered
more relevant. As we can see in table I the best performance
achieved with SVMs is around 90% classication rate.
V. CONCLUSION
In our analysis we focused on the classication of bear-
ing defects based on acoustic emission signals. We brought
together the probabilistic model related features with the AE
features and used them jointly to complete the task. We can
conclude that using both improves classication performance.
Our future goal is to improve performance with the introduc-
tion of frequency and burstform based features and to use
methods that are computationally less expensive.
ACKNOWLEDGMENTS
The authors would like to thank Ali Ganji and Bas van der
Vorst for supervising the work and Abdullah M. Al-Ghamd
and David Mba for providing the data.
REFERENCES
[1] A. M. Al-Ghmad and D. Mba, A comparative experimental study of
the use of acoustic emission and vibration analysis for bearing defect
identication and estimation of defect size, Mechanical Systems and
Signal Processing, vol. 20, pp. 15371571, 2006.
[2] A. Ganji, Acoustic emission to assess bearing lubrication condition: a
pre-study, SKF E.R.C., Tech. Rep., 2003.
[3] A. Ganji and J. Holsnijders, Acoustic emission measurements focused
on bearing lubrication, SKF E.R.C., Tech. Rep., 2004.
[4] N. Jamaludin and D. Mba, Monitoring extremely slowly rolling element
bearings: part I, NDT&E International, vol. 35, pp. 349358, 2002.
[5] , Monitoring extremely slowly rolling element bearings: part II,
NDT&E International, vol. 35, pp. 359366, 2002.
[6] R. Prado and M. West, Time series modelling, inference and forecasting,
2005, manuscript. (It can be found on M.Wests webpage.).
[7] C. J. C. Burges, A tutorial on support vector machines for pattern
recognition, Data Mining and Knowledge Discovery, vol. 2, no. 2.
[8] T. Jaakkola and D. Haussler, Exploiting generative models in discrim-
inative classiers, in Proceedings of the 1998 conference on Advances
in neural information processing systems II, 1999, pp. 487 493.
APPENDIX
FISHER SCORE AND FISHER MATRIX FOR AR(p) MODELS
In the sequel we present the calculation of the Fisher score
and Fisher matrix for AR(p) models. For ease in computation
instead of s we use the so-called precision parameter v =
log(1/s) and dene
L(, v) = log p(y|Y
p
, , v).
The Fisher score is given by

L(, v) = exp(v)(Y
T
p
Y
p
)(

(y, Y
p
) )

v
L(, v) =
1
2
exp(v)Q(y, ; Y
p
) +
T
2
and the elements of Fisher matrix are
E
_

2

T
L(, v)
_
= exp(v)Y
T
p
Y
p
E
_

2
v
L(, v)
_
= 0
E
_

2
v
2
L(, v)
_
=
1
2
.
We assumed that the sequences in the dataset are indepen-
dently sampled, therefore the Fisher matrix of the model for
the whole dataset is given by the sum of Fisher matrices of
each sample.

Anda mungkin juga menyukai