PBL-Mcrbfn neural network

© All Rights Reserved

13 tayangan

PBL-Mcrbfn neural network

© All Rights Reserved

- Machine Learning General Concepts
- FS2 Episode4 Final
- NQESH (Final Mock Test) with ANSWERS.docx
- Facilitating Human Learning
- Decoupling Neural Networks from the Location-Identity Split in Evolutionary Programming
- 024
- The Role of Metacognition
- 42 Paper 30011060 IJCSIS Camera Ready Pp262-266
- tmp5555.tmp
- Performance Analysis of Solar Panels on Cloud Services
- Critical Thinking Review Final
- Selecting an Artificial Neural Network for Efficient Modeling And
- Learning and Teaching Strategies » American Scientist Article
- Chemical Bonding Final
- 10.1.1.136
- Mcdougall Presentation
- Learinig English in a real life
- AI-PDF
- General Methods and Techniques of Teaching
- ed 302 unit plan lesson 4 pdf

Anda di halaman 1dari 13

journal homepage: www.elsevier.com/locate/asoc

for classication problems

G. Sateesh Babu, S. Suresh

School of Computer Engineering, Nanyang Technological University, Singapore

a r t i c l e

i n f o

Article history:

Received 2 February 2012

Received in revised form 24 May 2012

Accepted 31 August 2012

Available online 23 September 2012

Keywords:

Meta-cognitive learning

Self-regulatory thresholds

Radial basis function network

Multi-category classication

Projection Based Learning

a b s t r a c t

Meta-cognitive Radial Basis Function Network (McRBFN) and its Projection Based Learning (PBL) algorithm for classication problems in sequential framework is proposed in this paper and is referred to as

PBL-McRBFN. McRBFN is inspired by human meta-cognitive learning principles. McRBFN has two components, namely the cognitive component and the meta-cognitive component. The cognitive component

is a single hidden layer radial basis function network with evolving architecture. In the cognitive component, the PBL algorithm computes the optimal output weights with least computational effort by nding

analytical minima of the nonlinear energy function. The meta-cognitive component controls the learning

process in the cognitive component by choosing the best learning strategy for the current sample and

adapts the learning strategies by implementing self-regulation. In addition, sample overlapping conditions are considered for proper initialization of new hidden neurons, thus minimizes the misclassication.

The interaction of cognitive component and meta-cognitive component address the what-to-learn, whento-learn and how-to-learn human learning principles efciently. The performance of the PBL-McRBFN is

evaluated using a set of benchmark classication problems from UCI machine learning repository and two

practical problems, viz., the acoustic emission signal classication and the mammogram for cancer classication. The statistical performance evaluation on these problems has proven the superior performance

of PBL-McRBFN classier over results reported in the literature.

2012 Elsevier B.V. All rights reserved.

1. Introduction

Neural networks are powerful tools that can be used to approximate the complex nonlinear inputoutput relationships efciently.

Hence, from the last few decades neural networks are extensively

employed to solve real world classication problems [1]. In a classication problem, the objective is to learn the decision surface that

accurately maps an input feature space to an output space of class

labels. Several learning algorithms for different neural network

architectures have been used in various problems in science, business, industry and medicine, including the handwritten character

recognition [2], speech recognition [3], biomedical medical diagnosis [4], prediction of bankruptcy [5], text categorization [6] and

information retrieval [7]. Among various architectures reported in

the literature, Radial Basis Function (RBF) network gaining attention due to its localization property of Gaussian function, and

widely used in classication problems. Signicant contributions

to RBF learning algorithms for classication problems are broadly

classied into two categories: (a) Batch learning algorithms: Gradient descent based learning was used to determine the network

E-mail address: ssundaram@ntu.edu.sg (S. Suresh).

1568-4946/$ see front matter 2012 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.asoc.2012.08.047

parameters [8]. Here, the complete training data are presented multiple times, until the training error is minimum. Alternatively, one

can implement random input parameter selection with least square

solution for the output weight [9,10]. In both cases, the number

of Gaussian functions required to approximate the true function is determined heuristically. (b) Sequential learning algorithms:

The number of Gaussian neurons required to approximate the

inputoutput relationship is determined automatically [1115].

Here, the training samples are presented one-by-one and discarded

after learning. Resource Allocation Network (RAN) [11] was the

rst sequential learning algorithm introduced in the literature. RAN

evolves the network architecture required to approximate the true

function using novelty based neuron growth criterion. Minimal

Resource Allocation Network (MRAN) [12] uses a similar approach,

but it incorporates error based neuron growing/pruning criterion.

Hence, MRAN determines compact network architecture than RAN

algorithm. Growing and Pruning Radial Basis Function Network

[13] selects growing/pruning criteria of the network based on the

signicance of a neuron. A sequential learning algorithm using

recursive least squares presented in [14], referred as an On-line

Sequential Extreme Learning Machine (OS-ELM). OS-ELM chooses

input weights randomly with xed number of hidden neurons and

analytically determines the output weights using minimum norm

least-squares. In case of sparse and imbalance data sets, the random

(a)

(b)

Metacognition

Monitoring

Control

Cognition

655

Metacognitive Component

Predicted

Output

Best learning

Strategy

Cognitive Component

(RBF Neural Network)

Fig. 1. (a) Nelson and Narens Model of meta-cognition and (b) McRBFN Model.

in the OS-ELM affects the performance signicantly as shown in

[16]. In neural-fuzzy framework, Evolving Fuzzy Neural Networks

(EFuNNs) [17] is the novel sequential learning algorithm. It has been

shown in [15] that the aforementioned algorithms works well for

the function approximation problems than the classication problems. A Sequential Multi-Category Radial Basis Function network

(SMC-RBF) [15] considers the similarity measure within class, misclassication rate and prediction error are used in neuron growing

and parameter update criterion. SMC-RBF has been shown that

updating the nearest neuron parameters in the same class as that

of the current sample helps in improving the performance than

updating a nearest neuron in any class.

Aforementioned neural network algorithms use all the samples

in the training data set to gain knowledge about the information

contained in the samples. In other words, they possess informationprocessing abilities of humans, including perception, learning,

remembering, judging, and problem-solving, and these abilities are

cognitive in nature. However, recent studies on human learning

has revealed that the learning process is effective when the learners adopt self-regulation in learning process using meta-cognition

[18,19]. Meta-cognition means cognition about cognition. In a metacognitive framework, human-beings think about their cognitive

processes, develop new strategies to improve their cognitive skills

and evaluate the information contained in their memory. If a radial

basis function network analyzes its cognitive process and chooses

suitable learning strategies adaptively to improve its cognitive process then it is referred to as Meta-Cognitive Radial Basis Function

Network (McRBFN). Such a McRBFN must be capable of deciding

what-to-learn, when-to-learn and how-to-learn the decision function from the stream of training data by emulating the human

self-regulated learning.

Self-adaptive Resource Allocation Network (SRAN) [20] and

Complex-valued Self-regulating Resource Allocation Network

(CSRAN) [21] address the what-to-learn component of metacognition by selecting signicant samples using misclassication

error and hinge loss error. It has been shown that the selecting

appropriate samples for learning and removing repetitive samples

helps in improving the generalization performance. Therefore, it is

evident that emulating the three components of human learning

with suitable learning strategies would improve the generalization

ability of a neural network. The drawbacks in these algorithms are:

(a) the samples for training are selected based on simple error criterion which is not sufcient to address the signicance of samples;

(b) the new hidden neuron center is allocated independently which

may overlap with already existed neuron centers leading to misclassication; (c) knowledge gained from past samples is not used;

and (d) uses computationally intensive extended Kalman lter for

parameter update. Meta-cognitive Neural Network (McNN) [22]

and Meta-cognitive Neuro-Fuzzy Inference System (McFIS) [23]

address the rst two issues efciently by using three components

of meta-cognition. However, McNN and McFIS use computationally intensive parameter update and does not utilize the past

knowledge stored in the network. Similar works using metacognition in complex domain are reported in [24,25]. Recently

proposed Projection Based Learning in meta-cognitive radial basis

function network [26] addresses the above issues in batch mode

except proper utilization of the past knowledge stored in the network and applied to solve biomedical problems in [2729]. In this

paper, we propose a meta-cognitive radial basis function network

and its fast and efcient projection based sequential learning algorithm.

There are several meta-cognition models available in human

physiology and a brief survey of various meta-cognition models are

reported in [30]. Among the various models, the model proposed

by Nelson and Narens in [31] is simple and clearly highlights the

various actions in human meta-cognition as shown in Fig. 1(a). The

model is analogous to the meta-cognition in human-beings and has

two components, the cognitive component and the meta-cognitive

component. The information ow from the cognitive component

to meta-cognitive component is considered monitoring, while the

information ow in the reverse direction is considered control.

The information owing from the meta-cognitive component to

the cognitive component either changes the state of the cognitive

component or changes the cognitive component itself. Monitoring

informs the meta-cognitive component about the state of cognitive

component, thus continuously updating the meta-cognitive components model of cognitive component, including, no change in

state.

McRBFN is developed based on the Nelson and Narens metacognition model [31] as shown in Fig. 1(b). Analogous to the Nelson

and Narens meta-cognition model [31], McRBFN has two components namely the cognitive component and the meta-cognitive

component as shown in Fig. 1(b). The cognitive component is a

single hidden layer radial basis function network with evolving

architecture. The cognitive component learns from the training

data by adding new hidden neurons and updating the output

weights of hidden neurons to approximate the true function. The

input weights of hidden neurons (center and width) are determined based on the training data and output weights of hidden

neurons are estimated using the projection based sequential learning algorithm. When a neuron is added to the cognitive component,

the input/hidden layer parameters are xed based on the input of

the sample and the output weights are estimated by minimizing

an energy function given by the hinge loss error as in [32]. The

problem of nding optimal weights is rst formulated as a linear programming problem using the principles of minimization

and real calculus [33,34]. The Projection Based Learning (PBL)

algorithm then converts the linear programming problem into a

system of linear equations and provides a solution for the optimal

weights, corresponding to the minimum energy point of the energy

function. The meta-cognitive component of McRBFN contains a

dynamic model of the cognitive component, knowledge measures

and self-regulated thresholds. Meta-cognitive component controls

the learning process of the cognitive component by choosing one of

the four strategies for each sample in the training data set. When a

656

McRBFN measures the knowledge contained in the current training sample with respect to the cognitive component using its

knowledge measures. Predicted class label, maximum hinge error

and class-wise signicance are considered as knowledge measures of the meta-cognitive component. Class-wise signicance is

obtained from spherical potential, which is used widely in kernel

methods to determine whether all the data points are enclosed

tightly by the Gaussian kernels [35]. Here, the squared distance

between the current sample and the hyper-dimensional projection helps in measuring the novelty in the data. Since, in this

paper, McRBFN addresses classication problems, we redene the

spherical potential in class-wise framework and is used in devising the learning strategies. Using the above mentioned measures

the meta-cognitive component constructs two sample based learning strategies and two neuron based learning strategies. One of

these strategies is selected for the current training sample such that

the cognitive component learns the true function accurately and

achieves better generalization performance. These learning strategies are adapted by meta-cognitive component using self-regulated

thresholds. In addition, the meta-cognitive component identies

the overlapping/non-overlapping conditions by measuring the distance from nearest neuron in the inter/intra-class. The McRBFN

using the PBL to obtain the network parameters is referred to as,

Projection Based Learning algorithm for a Meta-cognitive Radial

Basis Network (PBL-McRBFN).

The performance of the proposed PBL-McRBFN classier is

evaluated using set of benchmark binary/multi-category classication problems from University of California, Irvine (UCI) machine

learning repository [36]. We consider ve multi-category and ve

binary classication problems with varying values of imbalance

factor. In all these problems, the performance of PBL-McRBFN

is compared against the best performing classiers available in

the literature using class-wise performance measures like overall/average efciency and a non-parametric statistical signicance

test [37]. The non-parametric Friedman test based on the mean

ranking of each algorithm over multiple data sets indicate the statistical signicance of the proposed PBL-McRBFN classier. Finally,

the performance of PBL-McRBFN classier has also been evaluated using two practical classication problems viz., the acoustic

emission signal classication [38] and the mammogram classication for breast cancer detection [39]. The results clearly highlight

that PBL-McRBFN classier provides a better generalization performance than the results reported in the literature.

The outline of this paper is as follows: Section 2 describes the

meta-cognitive radial basis network for classication problems.

Section 3 presents the performance evaluation of PBL-McRBFN classier on a set of benchmark and practical classication problems,

and compares with the best performing classiers available in the

literature. Section 4 summarizes the conclusions from this study.

2. Meta-cognitive radial basis function network for

classication problems

In this section, we describe the meta-cognitive radial basis function network for solving classication problems. First, we dene

the classication problem. Next, we present the meta-cognitive

radial basis function network architecture. Finally, we present the

sequential learning algorithm and summarize in a pseudo-code

form.

2.1. Problem denition

Given stream of training data samples, {(x1 , c1 ), . . ., (xt , ct ), . . . },

t ]T Rm is the m-dimensional input of the tth

where xt = [x1t , . . . , xm

sample, and ct (1, n) is its class label. Where n is the total number

of classes. The coded class labels (yt = [y1t , . . . , yjt , . . . , ynt ]T ) Rn

are given by:

yjt =

if

ct = j

j = 1, . . . , n

1 otherwise

(1)

The objective of McRBFN classier is to approximate the underlying decision function that maps xt Rm yt Rn . McRBFN begins

with zero hidden neuron and selects suitable strategy for each sample to achieve this objective. In the next section, we describe the

architecture of McRBFN and discuss each of these learning strategies in detail.

2.2. McRBFN architecture

McRBFN has two components, namely the cognitive component and the meta-cognitive component, as shown in Fig. 2. The

cognitive component is a single hidden layer radial basis function

network with evolving architecture starting from zero hidden neuron. The meta-cognitive component of McRBFN contains dynamic

model of the cognitive component, knowledge measures and

self-regulated thresholds. Meta-cognitive component controls the

learning process of the cognitive component by choosing one of

the four strategies for each sample in the training data set. When a

new training sample presented to the McRBFN, the meta-cognitive

component of McRBFN estimates the knowledge present in the new

training sample with respect to the cognitive component. Based

on this information, the meta-cognitive component controls the

learning process of the cognitive component by selecting suitable

strategy for the current training sample to address what-to-learn,

when-to-learn and how-to-learn properly.

We present a detailed description of the cognitive and the metacognitive components of McRBFN in the following sections:

2.2.1. Cognitive component of McRBFN

The cognitive component of McRBFN is a single hidden layered

feed forward radial basis function network with a linear input and

output layers. The neurons in the hidden layer of the cognitive

component of McRBFN employ the Gaussian activation function.

Without loss of generality, we assume that the McRBFN builds

K Gaussian neurons from t 1 training samples. For a given input

xt , the predicted output of the jth output neuron (

yjt ) of McRBFN is

yjt =

K

wkj htk ,

j = 1, . . . , n

(2)

k=1

where wkj is the weight connecting the kth hidden neuron to the

jth output neuron and htk is the response of the kth hidden neuron

to the input xt is given by

htk = exp

xt lk 2

(kl )2

(3)

where lk Rm is the center and kl R+ is the width of the kth hidden neuron. Here, the superscript l represents the corresponding

class of the hidden neuron.

The cognitive component uses Projection Based Learning (PBL)

algorithm for learning process. The strategy proposed here is similar to that of fast learning algorithm for single layer neural network

in [33,34]. The PBL algorithm is described as follows.

Projection Based Learning algorithm: The Projection Based

Learning algorithm works on the principle of minimization of

energy function and nds the optimal network output parameters

657

for which the energy function is minimum, i.e, the network achieves

the minimum energy point of the energy function.

The considered energy function is the sum of squared hinge loss

error at McRBFN output neurons. The energy function for ith sample

is dened as

Ji =

n

i

2

ej

i = 1, . . . , t

(4)

j=1

eji

0

yji

yji

if yji

yji > 1

j=1

yji )2

(yji

n

j=1

Ji =

yji

i=1

n

t

1

2

i=1 j=1

(9)

t

K

wkj hik

K

t

hip yji

(10)

i=1

yji

0

K

j = 1, . . . , n

(11)

i = 1, . . . , t

(6)

AW = B

(12)

2

wkj hik

p = 1, . . . , K;

J(W) =

j = 1, . . . , n

k=1

k=1

t

1

p = 1, . . . , K;

(5)

When yji

Ji =

J(W)

= 0,

wpj

k=1 i=1

j = 1, . . . , n

otherwise

n

0

i.e.,

akp =

t

hik hip ,

k = 1, . . . , K;

p = 1, . . . , K

(13)

i=1

(7)

k=1

where hik is the response of the kth hidden neuron for ith training

sample.

The optimal output weights (W RKn ) are estimated such

that the total energy reaches its minimum.

bpj =

t

hip yji ,

p = 1, . . . ,

K; j = 1, . . . , n

(14)

i=1

(8)

output weights W. We state the following prepositions to nd the

closed-form solution for these set of linear equations.

the energy function (J(W* )) is obtained by equating the rst order

i.e.xi , when k =

/ p, hik =

/ hip ; k, p = 1, . . ., K, i = 1, . . ., t.

W := arg min

WRKn

J(W)

658

/ p, this

assumption is valid if and only if

lp == lk AND pl == kl

(15)

But the pair of vectors lk and lp are allocated based upon the

selected signicant training samples for addition of neurons, these

signicant samples are selected using neuron growth criterion as

in Eq. (33). Neuron growth criterion uses maximum hinge error

(Et ) and class-wise signicance ( c ). c dened such that a new

neuron is added such that when there is no neuron present near

to the current sample which produces signicant output for the

current sample. So there are no two neuron centers are equal and

hence, the response of the kth and pth hidden neurons are not equal

for all samples.

Proposition 2. The response of the each hidden neuron is non-zero

for at least few samples.

Proof. Let us assume that the response of kth hidden neuron is

0, i.e., hik = 0 xi . This is possible if and only if xi , or lk

, or kl 0

The input variables xi are normalized in a circle of radius 1 such

that |xj | < 1 ; j = 1, . . ., m. As shown in overlapping conditions of the

growth strategy in subsection 2.2.3 that hidden neuron centers are

allocated based upon the selected signicant training samples and

widths are determined based upon inter/intra class nearest neuron

distances which are nonzero positive values. Hence, the response

of the hidden neuron is non-zero for at least few samples.

We state the following theorem, using the Propositions 1 and 2.

Theorem 1. The projection matrix A is a positive denite symmetric

matrix, and hence it is invertible.

Proof. From the denition of the projection matrix A given in Eq.

(13),

Apk =

t

hip hik ,

p = 1, . . . , K;

k = 1, . . . , K

(16)

i=1

Akk =

t

hik hik ,

k = 1, . . . , K

(17)

i=1

Therefore Eq. (17) can be written as

Akk =

t

|hik |2 > 0

(18)

i=1

positive, i.e., Aikk R+ > 0.

The off-diagonal elements of the projection matrix (A) are:

Akj =

t

i=1

hik hij =

t

(19)

i=1

From Eqs. (17) and (19), it can be inferred that the projection matrix

A is a symmetric matrix.

A symmetric matrix is positive denite iff for any q =

/ 0,

qT Aq > 0. Let us consider an unit basis vector q1 RK1 such

that q11 = 1 and q12 q1K = 0, i.e., q1 = [1 0 0 0]T . Therefore,

qT1 Aq1 = A11 In Eq. (17), it was shown that k = 1, . . . , K, Akk R >

0. Therefore, A11 R > 0 qT1 Aq1 > 0. Similarly, for an unit basis

vector qk = [0 1 0]T , the product qTk Aqk is given by

qTk Aqk = Akk > 0;

k = 1, . . . , K

(20)

vectors, i.e., p = q1 t1 + + qk tk + + qK tK , where tk R is the transformation constant. Then,

pT Ap =

K

(qk tk )T A

k=1

K

(qk tk ) =

k=1

K

|tk |2 Akk

(21)

k=1

As shown in Eq. (17), Akk R > 0. Also, that |tk |2 R > 0 is evident.

Hence,

|tk |2 Akk R > 0; k = 1, . . . , K

K

(22)

k=1

invertible.

The solution for W obtained as a solution to the set of equa2

derivative of the energy function (J) with respect to the output

weights is given by,

J(W) i i i 2

=

hp hp =

|hp | > 0

wlp 2

t

i=1

i=1

(23)

following observations can be made from Eq. (23):

1 The function J is a convex function.

2 The output weight W* obtained as a solution to the set of linear

equations (Eq. (12)) is the weight corresponding to the minimum

energy point of the energy function (J).

Using the Theorem 1, the solution for the system of equations in Eq.

(12) can be determined as follows:

W = A1 B

(24)

The meta-cognitive component contains dynamic model of

the cognitive component, knowledge measures and self-regulated

thresholds. During the learning process, meta-cognitive component monitors the cognitive component and updates its dynamic

model of the cognitive component. When a new training sample

(tth) sample is presented to the McRBFN, the meta-cognitive component of McRBFN estimates the knowledge present in the new

training sample with respect to the cognitive component using

its knowledge measures. The meta-cognitive component uses predicted class label (

c t ), maximum hinge error (Et ), condence of

t

t

classier (p(c |x )) and class-wise signicance ( c ) as the measures

of knowledge in the new training sample. Self-regulated thresholds

are adapted to capture the knowledge presented in the new training

sample. Using the knowledge measures and self-regulated thresholds, the meta-cognitive component constructs two sample based

learning strategies and two neuron based learning strategies. One

of these strategies is selected for the new training sample such that

the cognitive component learn them accurately and achieves better

generalization performance.

The meta-cognitive component measures are dened as shown

below:

t

y ), the

Predicted class label (

c t ): Using the predicted output (

t

predicted class label (

c ) can be obtained as

(25)

j1,...,n

mize the error between the predicted output (

y ) and actual output

t

(y ). In classication problems, it has been shown in [32,40] that

the classier developed using hinge loss error estimates the posterior probability more accurately than the classier developed using

mean square error. Hence, in

McRBFN, we use the hinge loss error

T

Et =

t

e

max

j1,2,...,n

(26)

Condence of Classier (p(c t |xt )): The condence level of classication or predicted posterior probability is given as

p (j|xt ) =

min(1, max(1, y jt )) + 1

2

j = ct

(27)

mapped on to a hyper-dimensional spherical feature space S using

K Gaussian neurons, i.e., xt H. Therefore, all H(xt ) lie on a hyperdimensional sphere as shown in [41]. The knowledge or spherical

potential of any sample in original space is expressed as a squared

distance from the hyper-dimensional mapping S centered at h0

[35].

In McRBFN, the center () and width () of the Gaussian neurons

describe the feature space S. Let the center of the K-dimensional

K

feature space be h0 = K1

h(k ). The knowledge present in the

k=1

t

new data x can be expressed as the potential of the data in the

original space, which is squared distance from the K-dimensional

feature space to the center h0 . The potential ( ) is given as

= ||h(xt ) h0 ||2

(28)

2

1

h(xt , lk ) + 2

K

K

K

= h(xt , xt )

k=1

K

h(lk , lr )

(29)

2

h(xt , lk )

K

K

(30)

k=1

distribution plays a vital role and it will inuence the performance

the classier signicantly [15]. Hence, we use the measure of the

spherical potential of the new training sample xt belonging to class

c with respect to the neurons associated to same class (i.e., l = c). Let

Kc be the number of neurons associated with the class c, then classwise spherical potential or class-wise signicance ( c ) is dened

as

K

1

c

Kc

h(x

, ck )

strategies for the new training sample.

Sample delete strategy: If the new training sample contains

information similar to the knowledge present in the cognitive

component, then delete the new training sample from the training data set without using it in the learning process.

Neuron growth strategy: Use the new training sample to add a

new hidden neuron in the cognitive component. During neuron

addition, sample overlapping conditions are identied to allocate

a new hidden neuron appropriately.

Parameter update strategy: The new training sample is used to

update the parameters of the cognitive component. PBL is used

to update the parameters.

Sample reserve strategy: The new training sample contains

some information but not signicant, they can be used at later

stage of the learning process for ne tuning the parameters of

the cognitive component. These sample may be discarded without learning or used for ne tuning the cognitive component

parameters in a later stage.

The principle behind these four learning strategies are described in

detail below:

Sample delete strategy: When the predicted class label of the

new training sample is same as the actual class label and the estimate posterior probability is close to 1, then the new training

sample does not provide additional information to the classier

and can be deleted from training sequence without being used in

learning process. The sample deletion criterion is given by

(32)

k,r=1

From the above equation, we can see that for Gaussian function

K

the rst term (h(xt , xt )) and last term (1/K 2 k,r=1 h lk , lr ) are

constants. Since potential is a measure of novelty, these constants

may be discarded and the potential can be reduced to

659

(31)

k=1

The spherical potential explicitly indicates the knowledge contained in the sample, a higher value of spherical potential (close

to one) indicates that the sample is similar to the existing knowledge in the cognitive component and a smaller value of spherical

potential (close to zero) indicates that the sample is novel.

2.2.3. Learning strategies

Meta-cognitive component devices various learning strategies using the knowledge measures and self-regulated thresholds,

which directly addresses the basic principles of self-regulated

human learning (i.e., what-to-learn, when-to-learn and how-tolearn). The meta-cognitive part controls the learning process in

samples participating in the learning process. If one selects d

close to 1 then all the training samples participates in the learning process which results in over-training with similar samples.

Reducing d beyond the desired accuracy results in deletion of too

many samples from the training sequence. But, the resultant network may not satisfy the desired accuracy. Hence, it is xed at the

expected accuracy level. In our simulation studies, it is selected

in the range of [0.90.95]. The sample deletion strategy prevents

learning of samples with similar information, and thereby, avoids

over-training and reduces the computational effort.

Neuron growth strategy: When a new training sample contains

signicant information and the predicted class label is different

from the actual class label then one need to add a new hidden

neuron to represent the knowledge contained in the sample. The

neuron growth criterion is given by

c =/ ct OR Et a AND

t

c (x )

(33)

where c is the meta-cognitive knowledge measurement threshold and a is the self-adaptive meta-cognitive addition threshold.

The terms c and a allows samples with signicant knowledge

for learning rst then uses the other samples for ne tuning. If

c threshold is chosen closer to zero and the initial value of a

threshold is chosen closer to the maximum value of hinge error,

then very few neurons will be added to the network. Such a network will not approximate the function properly. If c threshold

is chosen closer to one and the initial value of a threshold is chosen closer to the minimum value of hinge error, then the resultant

network may contain many neurons with poor generalization

ability. Hence, the range for the meta-cognitive knowledge measurement threshold can be selected in the interval [0.30.7] and

the range for the initial value of self-adaptive meta-cognitive

660

a is adapted as follows

a := a + (1 )E t

(34)

close to one. The a adaptation allows McRBFN to add neurons

only when presented samples to the cognitive network contains

signicant information.

The new training sample may have overlap with other classes

or will be from a distinct cluster far away from the nearest neuron in the same class. Therefore, one need to identify the current

neuron status (overlapping with other classes or distinct cluster

in the same class) with respect to exiting neurons and initialize

the parameters of the new neuron (K + 1). The existing sequential

learning algorithms initialize width based on the distance with

nearest neuron and output weight as error based on the current

sample. The inuence of past samples is not considered in weight

initialization. Hence, it will affect the performance of the classier signicantly. The above mentioned issues are dealt in the

proposed McRBFN as

Inter/intra class nearest neuron distances from the current

sample for width determination.

Existing knowledge of past samples stored in the network as

neuron center is used to initialize the weight of new neuron.

Let nrS be the nearest hidden neuron in the intra-class and nrI be

the nearest hidden neuron in the inter-class. They are dened as

nrS = arg min xt lk ;

l==c;k

l=

/ c;k

(35)

nrS and nrI are given as follows

dS = ||xt cnrS ||;

dI = ||xt lnrI ||

(36)

overlapping/no-overlapping conditions as follows:

Distinct sample: when a new training sample is far away from

c

both intra/inter class nearest neurons (dS >> nrS

AND dI >>

l ) then the new training sample does not overlap with any

nrI

class cluster, and is from a distinct cluster. In this case, the new

c

hidden neuron center (cK+1 ) and width (K+1

) parameters are

determined as

cK+1 = xt ;

c

K+1

=

(xt )T xt

c

K+1

= xt cnrS

(37)

(38)

Minimum overlapping with the inter-class: when a new training sample is close to the inter-class nearest neuron compared

to the intra-class nearest neuron, i.e., the intra/inter class distance ratio is in range 11.5, then the sample has minimum

overlapping with the other class. In this case, the center of the

new hidden neuron is shifted away from the inter-class nearest

neuron and shifted towards the intra-class nearest neuron, and

is initialized as

cK+1 = xt + (cnrS lnrI );

c

K+1

= cK+1 cnrS

cK+1 = xt (lnrI xt );

c

K+1

= cK+1 lnrI

(40)

helps in minimizing the misclassication in McRBFN classier.

When a neuron is added to McRBFN, based on the existing

knowledge of past samples stored in the network the output

weights are estimated using the PBL as follows:

The size of matrix A is increased from K K to (K + 1) (K + 1)

(41)

t

neurons response for new (tth) training sample. In sequential

learning samples are discarded after learning, but the information

present in the past samples are stored in the work. The centers

of neuron provides the distribution of past samples in feature

space. These centers can be used as pseudo-samples to capture

the effect of past samples. Hence, existing hidden neurons are

used as pseudo-samples to calculate aK+1 and aK+1,K+1 terms.

aK+1 R1K is assigned as

K+1

aK+1,p =

i=1

hiK+1 hip ,

= exp

p = 1, . . . , K where hip

li lp 2

(42)

(pl )2

aK+1,K+1 =

K+1

hiK+1 hiK+1

(43)

i=1

responses of the hidden units in the input space, which lies in

the range 0.5 1.

No-overlapping: When a new training sample is close to the

intra-class nearest neuron then the sample does not overlap

with the other classes, i.e., the intra/inter class distance ratio

is less than 1, then the sample does not overlap with the other

classes. In this case, the new hidden neuron center (cK+1 ) and

c

) parameters are determined as

width (K+1

cK+1 = xt ;

where is center shift factor which determines how much center has to be shifted from the new training sample location. In

our simulation studies value is xed to 0.1.

Signicant overlapping with the inter-class: When a new training

sample is very close to the inter-class nearest neuron compared

to the intra-class nearest neuron, i.e., the intra/inter class distance ratio is more than 1.5, then the sample has signicant

overlapping with the other class. In this case, the center of the

new hidden neuron is shifted away from the inter-class nearest

neuron and is initialized as

(39)

Bt(K+1)n =

T t

T

t

Bt1

Kn + h

bK+1

(44)

bK+1,j =

K+1

hiK+1 y ji ,

j = 1, . . . , n

(45)

i=1

neuron (li ) given as

y ji =

if

l=j

1 otherwise j = 1, . . . , n

(46)

WtK

wtK+1

= At(K+1)(K+1)

Bt(K+1)n

(47)

where WtK is the output weight matrix for K hidden neurons, and

wtK+1 is the vector of output weights for new hidden neuron after

calculated recursively using matrix identities as

By

substituting

661

t1

Bt1 = At1 Wt1

+ ht

K &A

t

T

reduced to

Wt1

K

h t = At

and

WtK = (At )1 (At Wt1

K

K ))

(57)

1 t

T t

T

+ At

WtK = Wt1

K

where

aK+1 At1

KK

(AtKK )

AtKK = At1 + ht

aTK+1

= (At1 )

and

AtKK

(At1 )

(48)

ht ,

= aK+1,K+1

is calculated as

(ht )T ht (At1 )

1 + ht (At1 )

(49)

(ht )T

the resultant equations are

WtK

IKK +

Wt1

+ At1

KK

K

wtK+1

At1

KK

aTK+1 aK+1

(50)

1 t

T t

T

h

At1

KK

(51)

bK+1

(52)

is used to update the output weights of the cognitive component

(WK = [w1 , w2 , . . . , wK ]T ) if the following criterion is satised.

c t ==

c t AND E t u

(58)

where et is the hinge loss error for tth sample obtained from Eq.

(5).

Sample reserve strategy: If the new training sample does not

satisfy either the deletion or the neuron growth or the cognitive

component parameters update criterion, then the current sample is pushed to the rear of the training sequence. Since McRBFN

modies the strategies based on the current sample knowledge,

these samples may be used in later stage.

Ideally, training process stops when no further sample is available in the data stream. However, in real-time, training stops

when samples in the reserve remains same.

2.3. PBL-McRBFN classication algorithm

To summarize, the PBL-McRBFN algorithm in a pseudo code

form is given in Pseudo code 1:

aTK+1 bK+1

1 t

T t

T

1

=

aK+1 Wt1

+ At1

h

y

KK

K

(53)

Pseudocode 1.

Algorithm.

from data stream.

Output: Decision function that estimates the relationship

between feature space and class label.

START

Initialization: Assign the rst sample as the rst neuron(K=1).

The parameters of the neuron are chosen as shown in Eq. (37).

Start learning for samples t = 2, 3,...

DO

Meta-cognitive component computes the signicance of the sample

with

respect to the cognitive component:

Computes the cognitive component output

y using Eq. (2).

t

threshold. If u threshold is chosen closer to 50% of maximum hinge error, then very few samples will be used for

adapting the network parameters and most of the samples

will be pushed to the end of the training sequence. The resultant network will not approximate the function accurately.

If a lower value is chosen, then all samples will be used in

updating the network parameters without altering the training

sequence. Hence, the range for the initial value of meta-cognitive

parameter update threshold can be selected in the interval

[0.40.7].

The u is adapted based on the hinge error as:

u := u + (1 )E t

(54)

parameter update and is set close to one.

When a sample is used to update the output weight parameters, the PBL algorithm updates the output weight parameters as

follows:

=

+

= 0,

wpj

wpj

wpj

p = 1, . . . , K;

j = 1, . . . , n

Finds the predicted class label

condence of classier p (c t |xt ) and class-wise signicance

using Eqs.(25),(26) and (31).

Based on above calculated measures the meta-cognitive

component selects one of the following strategies:

Sample Delete Strategy:

t

IF

c t == c t ANDp(c t |xt ) d THEN

Delete the sample from the sequence without learning.

Neuron Growth Strategy:

ct =

/ c t ORE t a AND c (xt ) c THEN

ELSEIF

Add a neuron to the network (K = K+1).

Choose the parameters of the new hidden neuron using Eqs.

(37) to (52).

Update the self-adaptive meta-cognitive addition

threshold according to Eq. (34)

Parameters Update Strategy:

ELSEIF c t ==

c t ANDE t u THEN

Update the parameters of the cognitive component using Eq. (58)

Update the self-adaptive meta-cognitive update threshold according

to Eq. (54)

Sample Reserve Strategy:

ELSE

The current sample xt , yt is pushed to the rear

end of the sample stack to be used in future. They can be

later used to ne-tune the cognitive component parameters.

ENDIF

(55)

Equating the rst partial derivative to zero and re-arranging Eq.

(55), we get

(At1 + (ht )T ht )WtK (Bt1 + (ht )T (yt )T ) = 0

(56)

ENDDO

END

In PBL-McRBFN, sample delete strategy address the what-tolearn by deleting insignicant samples from training data set,

662

Table 1

Description of benchmark data sets selected from UCI machine learning repository for performance study.

Data sets

No. of features

No. of classes

No. of samples

Training

I.F

Training

Testing

IRIS

WINE

Vehicle classication (VC)

Glass identication (GI)

19

4

13

18

9

7

3

3

4

6

210

45

60

424a

109a

2100

105

118

422

105

0

0

0

0.1

0.68

0

0

0.29

0.12

0.77

HEART

Liver disorders (LD)

PIMA

Breast cancer (BC)

Ionosphere (ION)

13

6

8

9

34

2

2

2

2

2

70

200

400

300

100

200

145

368

383

251

0.14

0.17

0.22

0.26

0.28

0.1

0.14

0.39

0.33

0.28

Testing

the how-to-learn efciently by which the cognitive component

learns from the samples, and self-adaptive nature of meta-cognitive

thresholds in addition to the sample reserve strategy address the

when-to-learn by presenting the samples in the learning process

according to the knowledge present in the sample.

3. Performance evaluation of PBL-McRBFN classier

PBL-McRBFN classier performance is evaluated on benchmark multi-category and binary classication problems from UCI

machine learning repository. The performance is compared with

the best performing sequential learning algorithm reported in the

literature (SRAN) [20], batch ELM classier [16] and also with the

standard support vector machines [42]. The data sets are chosen

with varying sample imbalance. The sample imbalance is measured

using Imbalance Factor (I.F) as

I.F = 1

n

min N

N j=1...n j

(59)

n

N . The description of these data sets including

class j and N =

j=1 j

the number of input features, the number of classes, the number of samples in the training/testing and the imbalance factor are

presented in Table 1. From Table 1, it can be observed that the problems chosen for the study have both balanced and unbalanced data

sets and the imbalance factors of the data sets vary widely. Finally,

PBL-McRBFN classier is used to solve two real-world classication

problems: the acoustic emission signal processing for health monitoring data set presented in [38] and the mammogram classication

for breast cancer detection data set presented in [43].

All the simulations are conducted in MATLAB 2010 environment

on a desktop PC with Intel Core 2 Duo, 2.66GHz CPU and 3GB RAM.

For ELM classier, the number of hidden neurons are obtained using

the constructive-destructive procedure presented in [44]. The simulations for batch SVM with Gaussian kernels are carried out using

the LIBSVM package in C [45]. For SVM classier, the parameters

(c,) are optimized using grid search technique. The performance

measures used to compare the classiers are described below.

3.1. Performance measures

The class-wise performance measures like overall/average efciencies and a statistical signicance test on performance of

multiple classiers on multiple data sets are used for performance

comparison.

3.1.1. Class-wise measure

The confusion matrix Q is used to obtain the class-level performance and global performance of the various classiers. Class-level

which is dened as:

j =

qjj

Nj

100%

(60)

class j and Nj is the total number of samples belonging to a class

j in the training/testing data set. The global measures used in the

evaluation are the average per-class classication accuracy (a ) and

the over-all classication accuracy (o ) dened as:

n

1

a =

j ,

n

n

o =

j=1

qjj

100%

(61)

j=1

The classication efciency itself is not a conclusive measure

of an classier performance [37]. Since the developed classier

is compared with multiple classiers over multiple data sets, the

Friedman test followed by the Benferroni-Dunn test is used to

establish the statistical signicance of PBL-McRBFN classier. A

brief description of the conducted test is given below.

Friedman Test: It is is used to compare multiple classiers (L)

j

over multiple data sets (M). Let ri be the rank of the jth classier

on the ith data set. Under the null-hypothesis, which states that

all the classiers are equivalent and so their average rank Rj (Rj =

1/M i ri ) over all data sets should be equal, the Friedman statistic

is given by

12M

2F =

L(L + 1)

2

+

1)

L(L

Rj2

(62)

L 1 degrees of freedom. A 2 distribution is the distribution of a

sum of squares of L independent standard normal variables.

Iman and Davenport showed that Friedmans statistic ( 2F ) is

more conservative and derived a better statistic [46]. It is given by

FF =

(M 1) 2F

M(L 1) 2F

(63)

degrees of freedom is used in this paper. F-distribution is dened

as the probability distribution of the ratio of two independent 2

distributions over their respective degrees of freedom. The aim of

the statistical test is to prove that the performance of PBL-McRBFN

classier is substantially different from the other classiers with a

condence level of value 1 . If calculated FF > F/2,(L1),(L1)(M1)

The Statistical tables for critical values can be found in [47].

Post-hoc Test: The Benferroni-Dunn test [48] is a post-hoc test

that can be performed after rejection of the null-hypothesis. It is

used to compare PBL-McRBFN classier against all the other classiers. This test assumes that the performances of two classiers

are signicantly different if the corresponding average ranks differ

by at least the Critical Difference (CD), i.e, (Ri Rj ) > CD then classier i performs signicantly than classier j. The critical difference

is calculated using

CD = q

L(L + 1)

6M

(64)

where critical

values q are based on the Studentized range statistic

divided by 2 as given in [37].

3.2. Performance evaluation on UCI benchmark data sets

The class-wise performance measures (average/overall) testing

efciencies, number of hidden neurons and samples used for PBLMcRBFN, SRAN, ELM and SVM classiers are reported in Table 2. The

Table 2 contains results of both the binary and the multi-category

classication data sets from UCI machine learning repository. From

Table 2, we can see that PBL-McRBFN classier performs slightly

better than the best performing SRAN classier and signicantly

better than ELM and SVM classiers on all the 10 data sets. In

addition, the proposed PBL-McRBFN classier requires fewer samples to learn the decision function and develops compact neural

architecture to achieve better generalization performance.

Well balanced data sets: In IS, IRIS, WINE data sets, the generalization performance of PBL-McRBFN is approximately 2% more

than SRAN classier and 3 4 % more than ELM and SVM classiers. On IS data set proposed PBL-McRBFN uses fewer samples to

achieve 2% improvement over SRAN and proposed PBL-McRBFN

achieves approximately 3 4 % improvement over ELM and SVM

classiers. Similar to IS, on IRIS and WINE data sets, PBL-McRBFN

uses fewer samples with less number of neurons to achieve better

generalization performance. PBL-McRBFN classier achieves better

generalization performance using meta-cognitive learning algorithm, which selects appropriate samples to used in learning based

on the current knowledge. Also, deletes many redundant samples

to avoid over training. For example, in IS data set, PBL-McRBFN

uses only 89 samples out of 210 training samples to build the best

classier.

In order to highlight the above-mentioned advantages of proposed PBL-McRBFN classier, we conduct a simulation study in

ELM classier with only training samples used by PBL-McRBFN

classier. On IS data set, PBL-McRBFN classier selects the best 89

samples for training and these samples are used in batch learning

ELM algorithm and we refer this classier as ELM* .

The testing performance of ELM* classier (which uses the best

89 samples sequence) is better than the original ELM classier

developed using 210 training samples. Also, ELM* achieves better

generalization performance with smaller number of hidden neurons (ELM* requires only 32 hidden neurons to achieve 92.14%

testing efciency whereas ELM requires 49 hidden neurons to

achieve 90.23%). This study clearly indicates that sample deletion

strategy present in PBL-McRBFN helps in achieving better decision

making ability.

Imbalanced data sets: In VC, GI, HEART, LD, PIMA, BC, ION data

sets, the generalization performance of PBL-McRBFN is approximately 2 10 % more than SRAN classier, and 2 15 % more than

ELM and SVM classiers. In case of imbalance data sets, PBLMcRBFN require more number of neurons to approximate the

decision surface with minimal samples for approximating the

663

centers and width of new neuron in PBL-McRBFN and metacognitive learning helps PBL-McRBFN to achieves signicantly

better generalization performance. For example, in VC data set proposed PBL-McRBFN uses fewer samples to achieve better average

testing efciency approximately 2% improvement over SRAN and

ELM classiers, and 10% improvement over SVM classier. The GI

data set has imbalance factor of 0.68 in training and 0.77 in testing.

Such high imbalance inuences the performance of SRAN, ELM and

SVM classiers. On GI data set, SRAN overall testing efciency (o ) is

6% more than the average testing efciency (a ). This is due to the

fact that SRAN classier is not able to capture the knowledge for

the classes which contain smaller number of samples accurately.

In case of proposed PBL-McRBFN classier, the average testing efciency (a ) is 8% more than the overall testing efciency (o ). Thus

the proposed PBL-McRBFN classier is able to captures the knowledge for the classes which contain smaller number of samples

accurately. On GI data set proposed PBL-McRBFN achieves better

average testing efciency 12% improvement over SRAN with fewer

samples, 5% improvement over ELM with less number of neurons,

and 15% improvement over SVM classier with fewer number of

neurons.

Binary data sets: On HEART and LD data sets proposed PBLMcRBFN achieves better average testing efciency approximately

2 7 % over SRAN, ELM and SVM with less number of neurons. On

PIMA and BC data sets proposed PBL-McRBFN achieves better average testing efciency approximately 1 2 % over SRAN, ELM and

SVM with fewer samples. On ION data set proposed PBL-McRBFN

uses fewer samples with less number of neurons to achieve better

average testing efciency 5% improvement over SRAN and 8 9 %

improvement over ELM and SVM. The overlapping conditions and

class specic criterion in learning strategies of PBL-McRBFN helps in

capturing the knowledge accurately in case of high sample imbalance problems. From the Table 2, we can say that the proposed

PBL-McRBFN improves average/overall efciency even under high

sample imbalance.

3.2.1. Statistical signicance analysis

In this section, we highlight the signicance of proposed PBLMcRBFN classier on multiple data set using non-parametric

Friedman test followed by the Benferroni-Dunn test as described

in Section 3.1.2. The Friedman test identify the measured average

ranks are signicantly different from the mean rank (mean rank

is 2.5) expected under the null-hypothesis. The Benferroni-Dunn

test highlights statistical difference in performance of PBL-McRBFN

classier over other classiers. From the Table 2, we can see that

our comparison study uses four classiers (L = 4) and ten data sets

(M = 10).

Non-parametric test using overall testing efciency (o ): Ranks of all

4 classiers based on the overall testing efciency for each data

set are provided in Table 3. The Friedman statistic ( 2F as in Eq.

(62)) is 16.89 and modied (Iman and Davenport) statistic (FF as

in Eq. (63)) is 11.59. For four classiers and ten data sets, the modied statistic is distributed according to the F-distribution with

3 and 27 degrees-of-freedom. The critical value for rejecting the

null hypothesis at signicance level of 0.05 is 3.65. Since, modied statistic is greater than the critical value (11.59 3.65), we

can reject the null hypothesis. Hence, we can say that the proposed PBL-McRBFN classier performs better than the existing

classiers on these data sets.

Next, we conduct the Benferroni-Dunn test to compare the proposed PBL-McRBFN classier with the all other classiers. From

Eq. (64), the critical difference (CD) is calculated as 1.382 for a

signicance level of 0.05 (q0.05 = 2.394). From Table 3, we can

see that the difference in average rank between the proposed

664

Table 2

Performance comparison of PBL-McRBFN with SRAN, ELM and SVM.

Data sets

PBL-McRBFN

K

SRAN

Samples

Used

IS

IRIS

WINE

VC

GI

HEART

LD

PIMA

BC

ION

a

50

6

11

175

71

20

87

100

13

18

89

20

29

318

115

69

116

162

45

58

Testing

ELM

o

a

94.19

98.10

98.31

78.91

84.76

81.50

73.1

79.62

97.39

96.41

94.19

98.10

98.69

79.09

92.72

81.47

72.63

76.67

97.85

96.47

Samples

Used

47

8

12

113

59

28

91

97

7

21

113

29

46

437

159

56

151

230

91

86

Testing

o

a

92.29

96.19

96.61

75.12

86.21

78.50

66.90

78.53

96.87

90.84

92.29

96.19

97.19

76.86

80.95

77.53

65.78

74.90

97.26

91.88

49

10

10

150

80

36

100

100

66

32

SVM

SVa

Testing

o

a

90.23

96.19

97.46

77.01

81.31

76.50

72.41

76.63

96.35

89.64

90.23

96.19

98.04

77.59

87.43

75.91

71.41

75.25

96.48

87.52

Testing

127

13

36

340

183

42

141

221

24

43

o

a

91.38

96.19

97.46

70.62

70.47

75.50

71.03

77.45

96.61

91.24

91.38

96.19

98.04

68.51

75.61

75.10

70.21

76.43

97.06

88.51

Table 3

Ranks based on the overall (o ) and average (a ) testing efciencies.

Data sets

IS

IRIS

WINE

VC

GI

HEART

LD

PIMA

BC

ION

Average rank (Rj )

PBL-McRBFN

SRAN

ELM

SVM

o

a

o

a

o

a

o

a

1

1

1

1

2

1

1

1

1

1

1.1

1

1

1

1

1

1

1

1

1

1

1

2

3

4

3

1

2

4

2

2

3

2.6

2

3

4

3

3

2

4

4

2

2

2.9

4

3

2.5

2

3

3

2

4

4

4

3.15

4

3

2.5

2

2

3

2

3

4

4

2.95

3

3

2.5

4

4

4

3

3

3

2

3.15

3

3

2.5

4

4

4

3

2

3

3

3.15

2.05 and 2.05. The difference in average rank is greater than the

critical difference. Hence, based on the overall testing efciency

the Benferroni-Dunn test shows that the proposed PBL-McRBFN

classier is signicantly better than the SRAN, ELM and SVM classiers.

Non-parametric test using average testing efciency (a ): Ranks of

all 4 classiers based on the average testing efciency for each

data set are provided in Table 3. The Friedman statistic ( 2F as

in Eq. (62)) is 18.21 and modied statistic (FF as in Eq. (63)) is

13.9. Since, modied statistic is greater than the critical value

(13.9 3.65), we can reject the null hypothesis. Hence, we can say

that the proposed PBL-McRBFN classier performs better than

the other classiers on these data sets.

From Table 3, we can see that the difference in average rank

between the proposed PBL-McRBFN classier and the other three

classiers are 1.9, 1.95 and 2.15. The difference in average rank is

greater than the critical difference (1.382). Hence, based on the

average testing efciency, the Benferroni-Dunn test also shows

that the proposed PBL-McRBFN classier performs better than

the other well known classiers. Next, we present the performance results of PBL-McRBFN classier on the two real-world

classication problem data sets, viz., an acoustic emission data

set for health monitoring presented in [38] and the mammogram

classication data set for breast cancer detection presented in

[43].

3.3. Acoustic emission signal classication for health monitoring

The stress or pressure waves produced by the sensitive transducer due to the transient energy released by the irreversible

deformation in the material are called as acoustic emission signals.

These signals are produced by various sources and classication/identication of sources using the acoustic emission signals is

acoustic emission signals in practical situations increases the complexity further. In addition, the supercial similarities between the

acoustic emission signals produced by different sources increases

the complexity further. In this section, we address classication

of such acoustic emission signals using the proposed PBL-McRBFN

classier. The experimental data provided for the burst type acoustic emission signals from the metallic surface is considered for our

study as given in [38]. The burst type acoustic emission signal is

characterized by 5 features and these signals are classied into one

of the 4 sources, namely, the pencil source, the pulse source, the

spark source and the noise source. Out of 199 samples, 62 samples

are used for training (as highlighted in [38]) and the remaining samples are used for testing the classier. For details on characteristics

of input features and the experimental setup, one should refer to

[38].

The performance study results of PBL-McRBFN classier are

compared against the SRAN, ELM, and SVM, and presented in

Table 4. It can be seen that PBL-McRBFN classier uses only 9 significant samples to build the classier and requires only 5 neurons to

achieve an over-all testing efciency of 99.27%. Thus, PBL-McRBFN

classier performs an efcient classication of the acoustic emission signals using a compact network.

Table 4

Performance comparison on acoustic emission signal problem.

Classier

PBL-McRBFN

SRAN

ELM

SVM

a

Hidden

Samples

Neurons

Used

o

a

5

10

10

22a

9

39

62

62

99.27

99.27

99.27

98.54

98.91

98.91

98.91

97.95

Testing

Table 5

Performance comparison on mammogram classication problem.

Classier

PBL-McRBFN

SRAN

ELM

SVM

1

Hidden

Samples

Neurons

Used

22

25

30

261

60

45

97

97

Acknowledgements

Testing

o

100

90.91

90.91

90.91

665

a

100

91.67

90.0

91.67

University-Ministry of Defence (NTU-MINDEF), Singapore, for the

nancial support (Grant number: MINDEF-NTU-JPP/11/02/05) to

conduct this study.

References

Mammogram is a better means for early diagnosis of breast cancer, as tumors and abnormalities show up in mammogram much

before they can be detected through physical examinations. Clinically, identication of malignant tissues involves detecting the

abnormal masses or tumors, if any, and then classifying the mass

as either malignant or benign as given in [39]. However, once a

tumor is detected, the only method of determining whether it is

benign or malignant is by conducting a biopsy, which is an invasive

procedure that involves the removal of the cells or tissue from a

patient. A non-invasive method of identifying the abnormalities

in a mammogram can reduce the number of unnecessary biopsies, thus sparing the patients of inconvenience and saving medical

costs. In this study, mammogram database available in [43] has

been used. The 9 input features extracted from the mammogram

of the identied abnormal mass are used to classify the tumor as

either malignant or benign. Here, 97 samples are used to develop

PBL-McRBFN classier and the performance of PBL-McRBFN classier is evaluated using the remaining 11 samples. For further

details on the input features and the data set, one should refer to

[43].

The performance results of PBL-McRBFN classier, in comparison with the SRAN, ELM and SVM are presented in Table 5.

From the table, it is seen that PBL-McRBFN classier performs a

highly efcient classication with 100% classication accuracy with

smaller number of hidden neurons. When compared to SRAN, ELM,

SVM classiers, performance of PBL-McRBFN is improved considerably.

Thus, from the performance study of PBL-McRBFN conducted

with SRAN, ELM, SVM for chosen benchmark data sets and practical

classication problems,it can be observed that the proposed PBLMcRBFN classier performs better than other classiers.

4. Conclusions

In this paper, we have presented a Meta-cognitive Radial Basis

Function Network (McRBFN) and its Projection Based Learning

(PBL) algorithm for classication problems in sequential framework. The meta-cognitive component in McRBFN controls the

learning of the cognitive component in McRBFN. The metacognitive component adapts the learning process appropriately by

implementing self-regulation and hence it decides what-to-learn,

when-to-learn and how-to-learn efciently. In addition, the overlapping conditions present in neuron growth strategy helps in proper

initialization of new hidden neuron parameters and also minimizes the misclassication error. The performance of the proposed

PBL-McRBFN classier has been evaluated using the benchmark

multi-category, binary classication problems from UCI machine

learning repository with wide range of imbalance factor and two

practical classication problems. The statistical performance comparison with the well-known classiers in the literature clearly

indicates the superior performance of the proposed PBL-McRBFN

classier.

[1] G.B. Zhang, Neural network for classication: a survey, IEEE Transactions on

Systems, Man and Cybernetics Part C: Applications and Reviews 30 (4) (2000)

451462.

[2] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D.

Jackel, Backpropagation applied to handwritten zip code recognition, Neural

Computation. 1 (1989) 541551.

[3] F.F. Li, T.J. Cox, A neural network model for speech intelligibility quantication,

Applied Soft Computing 7 (1) (2007) 145155.

[4] S. Ari, G. Saha, In search of an optimization technique for articial neural network to classify abnormal heart sounds, Applied Soft Computing 9 (1) (2009)

330340.

[5] V. Ravi, C. Pramodh, Threshold accepting trained principal component neural

network and feature subset selection: application to bankruptcy prediction in

banks, Applied Soft Computing 8 (4) (2008) 15391548.

[6] M.E. Ruiz, P. Srinivasan, Hierarchical text categorization using neural networks,

Information Retrieval 5 (2002) 87118.

[7] M. Khan, S.W. Khor, Web document clustering using a hybrid neural network,

Applied Soft Computing 4 (4) (2004) 423432.

[8] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by backpropagation errors, nature, Nature 323 (1986) 533536.

[9] G.-B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning

scheme of feedforward neural networks, IEEE International Joint Conference

on Neural Networks. Proceedings 2 (2004) 985990.

[10] G.-B. Huang, X. Ding, H. Zhou, Optimization method based extreme learning

machine for classication, Neurocomputing 74 (1-3) (2010) 155163.

[11] J.C. Platt, A resource allocation network for function interpolation, Neural Computation 3 (2) (1991) 213225.

[12] L. Yingwei, N. Sundararajan, P. Saratchandran, A sequential learning scheme for

function approximation using minimal radial basis function neural networks,

Neural Computation 9 (2) (1997) 461478.

[13] G.-B. Huang, P. Saratchandran, N. Sundararajan, An efcient sequential learning

algorithm for growing and pruning RBF (GAP-RBF) networks, IEEE transactions on Systems, Man, and Cybernetics. Part B, Cybernetics 34 (6) (2004)

22842292.

[14] N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accurate

online sequential learning algorithm for feedforward networks., IEEE Transactions on Neural Networks 17 (6) (2006) 14111423.

[15] S. Suresh, N. Sundararajan, P. Saratchandran, A sequential multi-category classier using radial basis function networks, Neurocomputing 71 (1) (2008)

13451358.

[16] S. Suresh, R.V. Babu, H.J. Kim, No-reference image quality assessment using

modied extreme learning machine classier, Applied Soft Computing 9 (2)

(2009) 541552.

[17] N. Kasabov, Evolving fuzzy neural networks for supervised/unsupervised

online knowledge-based learning, IEEE Transactions on Systems, Man, and

Cybernetics, Part B: Cybernetics 31 (6) (2001) 902918.

[18] W.P. Rivers, Autonomy at all costs: an ethnography of metacognitive selfassessment and self-management among experienced language learners, The

Modern Language Journal 85 (2) (2001) 279290.

[19] R. Isaacson, F. Fujita, Metacognitive knowledge monitoring and self-regulated

learning: academic success and reections on learning, Journal of the Scholarship of Teaching and Learning 6 (1) (2006) 3955.

[20] S. Suresh, K. Dong, H.J. Kim, A sequential learning algorithm for self-adaptive

resource allocation network classier, Neurocomputing 73 (1618) (2010)

30123019.

[21] S. Suresh, R. Savitha, N. Sundararajan, A sequential learning algorithm

for complex-valued self-regulating resource allocation network-CSRAN, IEEE

Transactions on Neural Networks 22 (7) (2011) 10611072.

[22] G. Sateesh Babu, S. Suresh, Meta-cognitive neural network for classication

problems in a sequential learning framework, Neurocomputing 81 (2012)

8696.

[23] K. Subramanian, S. Suresh, A meta-cognitive sequential learning algorithm

for neuro-fuzzy inference system, Applied Soft Computing 12 (11) (2012)

36033614.

[24] R. Savitha, S. Suresh, N. Sundararajan, Metacognitive learning in a fully

complex-valued radial basis function neural network, Neural Computation 24

(5) (2012) 12971328.

[25] R. Savitha, S. Suresh, N. Sundararajan, A meta-cognitive learning algorithm

for a Fully Complex-valued Relaxation Network, Neural Networks 32 (2012)

209218.

[26] G. Sateesh Babu, R. Savitha, S. Suresh, A projection based learning in metacognitive radial basis function network for classication problems, in: The

2012 International Joint Conference on Neural Networks (IJCNN), 2012, pp.

29072914.

666

[27] G. Sateesh Babu, S. Suresh, B.S. Mahanand, Alzheimers disease detection using

a Projection Based Learning Meta-cognitive RBF Network, in: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012, pp. 408415.

[28] G. Sateesh Babu, S. Suresh, K. Uma Sangumathi, H. Kim, A Projection Based

Learning Meta-cognitive RBF network classier for effective diagnosis of

Parkinsons disease, in: J. Wang, G. Yen, M. Polycarpou (Eds.), Advances in Neural Networks ISNN 2012, vol. 7368 of Lecture Notes in Computer Science,

Springer, Berlin / Heidelberg, 2012, pp. 611620.

[29] G. Sateesh Babu, S. Suresh, Parkinsons disease prediction using gene expression a projection based learning meta-cognitive neural classier approach,

Expert Systems with Applications (2012), http://dx.doi.org/10.1016/j.eswa.

2012.08.070

[30] M.T. Cox, Metacognition in computation: a selected research review, Articial

Intelligence 169 (2) (2005) 104141.

[31] T.O. Nelson, L. Narens, Metamemory: A Theoretical Framework and New Findings, Allyn and Bacon, Boston, USA, 1992.

[32] S. Suresh, N. Sundararajan, P. Saratchandran, Risk-sensitive loss functions for

sparse multi-category classication problems, Information Sciences 178 (12)

(2008) 26212638.

A. Alonso-Betanzos, A

global optimum approach for one-layer neural networks, Neural Computation

14 (6) (2002) 14291449.

O. Fontenla-Romero, A. Alonso-Betanzos, A

[34] E. Castillo, B. Guijarro-Berdinas,

very fast learning method for neural networks based on sensitivity analysis,

Journal of Machine Learning Research 7 (2006) 11591182.

[35] H. Hoffmann, Kernel PCA for novelty detection, Pattern Recognition 40 (3)

(2007) 863874.

[36] C. Blake, C. Merz, UCI repository of machine learning databases, University of

California, Irvine, Department of Information and Computer Sciences, 1998,

http://archive.ics.uci.edu/ml/

[37] J. Demsar, Statistical comparisons of classiers over multiple data sets, The

Journal of Machine Learning Research 7 (2006) 130.

[38] S.N. Omkar, S. Suresh, T.R. Raghavendra, V. Mani, Acoustic emission signal classication using fuzzy C-means clustering, Proceedings of the ICONIP

02, 9th International Conference on Neural Information Processing 4 (2002)

18271831.

[39] C. Aize, Q. Song, X. Yang, S. Liu, C. Guo, Mammographic mass detection by vicinal

support vector machine, Proceedings of the ICNN 04, International Conference

on Neural Networks 3 (2004) 19531958.

[40] T. Zhang, Statistical behavior and consistency of classication methods based

on convex risk minimization, Annals of Statistics 32 (1) (2004) 5685.

[41] B. Scholkopf, A.J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002.

[42] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3) (1995)

273297.

[43] J. Suckling, J. Parker, D.R. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Stamatakis, N. Cerneaz, S. Kok, et al., The mammographic image analysis society

digital mammogram database, Experta Medica International Congress Series

1069 (1994) 375378.

[44] S. Suresh, S.N. Omkar, V. Mani, T.N.G. Prakash, Lift coefcient prediction at

high angle of attack using recurrent neural network, Aerospace Science and

Technology 7 (8) (2003) 595602.

[45] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (2011) 27:1-27:27, software

available at http://www.csie.ntu.edu.tw/ cjlin/libsvm

[46] R.L. Iman, J.M. Davenport, Approximations of the critical region of the Friedman

statistic, Communications in Statistics (1980) 571595.

[47] J.H. Zar, Biostatistical Analysis, 4th Ed., Prentice-Hall, Englewood Clifs, New

Jersey, 1999.

[48] O.J. Dunn, Multiple comparisons among means, Journal of the American Statistical Association 56 (293) (1961) 5264.

Mr. Giduthuri Sateesh Babu received the B.Tech degree

in electrical and electronics engineering from Jawaharlal Nehru Technological University, India, in 2007, and

M.Tech degree in electrical engineering from Indian Institute of Technology Delhi, India, in 2009. From 2009 to

2010, he worked as a senior software engineer in Samsung R&D centre, India. He is currently a Ph.D. student

with School of Computer Engineering, Nanyang Technological University, Singapore. His research interests

include machine learning, cognitive computing, neural

networks, control systems, optimization and medical

informatics.

and electronics engineering from Bharathiyar University in 1999, and M.E (2001) and Ph.D. (2005) degrees

in aerospace engineering from Indian Institute of Science, India. He was post-doctoral researcher in school

of electrical engineering, Nanyang Technological University from 2005 to 2007. From 2007 to 2008, he was in

INRIA-Sophia Antipolis, France as ERCIM research fellow. He was in Korea University for a short period as a

visiting faculty in Industrial Engineering. From January

2009 to December 2009, he was in Indian Institute of

Technology-Delhi as an Assistant Professor in Department

of Electrical Engineering. Currently, he is working as an

Assistant Professor in School of Computer Engineering, Nanyang Technological University, Singapore since 2010. He was awarded best young faculty for the year

2009 by IIT-Delhi His research interest includes ight control, unmanned aerial

vehicle design, machine learning, applied game theory, optimization and computer

vision.

- Machine Learning General ConceptsDiunggah olehJie Bao
- FS2 Episode4 FinalDiunggah olehSharreah Lim
- NQESH (Final Mock Test) with ANSWERS.docxDiunggah olehAlfred Soliva
- Facilitating Human LearningDiunggah olehRolex Daclitan Bajenting
- Decoupling Neural Networks from the Location-Identity Split in Evolutionary ProgrammingDiunggah olehDavid Anderson
- 024Diunggah olehVelimir Topolovec
- The Role of MetacognitionDiunggah olehModelice
- 42 Paper 30011060 IJCSIS Camera Ready Pp262-266Diunggah olehijcsis
- Performance Analysis of Solar Panels on Cloud ServicesDiunggah oleheditor3854
- tmp5555.tmpDiunggah olehFrontiers
- Critical Thinking Review FinalDiunggah olehHamidJafari
- Selecting an Artificial Neural Network for Efficient Modeling AndDiunggah olehvaalgatamilram
- Chemical Bonding FinalDiunggah olehRituraj Tripathy
- Learning and Teaching Strategies » American Scientist ArticleDiunggah olehDuval Pearson
- 10.1.1.136Diunggah olehKucIng HijAu
- Mcdougall PresentationDiunggah olehmaharajsaini4229
- Learinig English in a real lifeDiunggah olehJavier Narvaez
- AI-PDFDiunggah olehÖmer Faruk Kara
- General Methods and Techniques of TeachingDiunggah olehVersoza Nel
- ed 302 unit plan lesson 4 pdfDiunggah olehapi-300683985
- Internal Rep Model 2Diunggah olehDrGeePee
- teach interviewDiunggah olehapi-320697160
- reading 1Diunggah olehapi-351292564
- VideoDiunggah olehChristianTeofiloGallegosCenteno
- Hollingworth PaperDiunggah olehnasir
- Machine Learning Curriculum _ CourseraDiunggah olehGetachew A. Abegaz
- Big Data Machine Learning Detailed CurriculumDiunggah olehUma Tamil
- Learning_to_Learn.pdfDiunggah olehsujit21in4376
- Analysis of Electrical Load Forecasting Using MATLABDiunggah olehfaisal140
- blooms taxonomyDiunggah olehJerelyn Seguep Victoriano

- Human Behavior OrganizationDiunggah olehMariaAzucena L Rance
- Chapter Four2Diunggah olehAdeel Khan
- New Eurostat on Foreign LanguagesDiunggah olehIfigenia Tsapikidou
- A Cross-cultural Study of Apologies in British Englis and PersianDiunggah olehToto Rodríguez
- TL-22-2-74Diunggah olehNur Fazilah Che Haron
- TeamSTEPPS pocketguide.pdfDiunggah olehUtomo Budidarmo
- Com Eslsca EnDiunggah olehAgus Fakhrudin
- Torn Between the Norms - Innovations in World EnglishesDiunggah olehEduardo
- Unit 3 Module 3 Task 2.docxDiunggah olehAlper Karazeybek
- What is a PolyrythmDiunggah olehanrihm
- IBC Slides 1Diunggah olehAbhishek Sharma
- DMTD AssignmentDiunggah olehElaine LaLa
- Entrepreneurship management.....Diunggah olehViŠhål Pätěl
- Grail Research Job Description Analyst MBADiunggah olehSubodh Singh
- module 1 for weebleyDiunggah olehapi-322899944
- EDU 530 Child & Adolescent DevelopmentDiunggah olehKath Cabauatan
- 88931211 Blommaert and Backus 2011 Super Diverse Repertoires and the IndividualDiunggah olehJohn Ruiters
- The BDIDiunggah olehma
- Program Evaluation and Review TechniqueDiunggah olehRohith Kumar Venna
- Presentation Persuasive speech.pptxDiunggah olehRafaela Fiallos
- 113562-CH04Diunggah olehNabilah Aziz
- Beauty and Moral BettermentDiunggah olehAndrea Carolina Urrutia Gomez
- Defining and Measuring 8 7 2Diunggah olehRoxani
- Recess FunDiunggah olehLochearnSchool
- OBST 501-909CourseOutline_Jan_25_2015_1969Diunggah olehFatima
- SBM PASBE Artifacts FinalizedDiunggah olehDohly Bucarile
- Teacher LibrarianDiunggah olehifezema321
- lesson-plan-ubd-es20-soil formationDiunggah olehapi-351677864
- Enriching Math Using ChessDiunggah olehFrank Ho
- Year One Intro to SLA - #9. Aptitude And_IntelligenceDiunggah olehMagda Wiktorowicz

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.