Anda di halaman 1dari 9

This article appeared in a journal published by Elsevier.

The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy

Expert Systems with Applications 38 (2011) 9305–9312

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

AdaBoost ensemble for financial distress prediction: An empirical comparison


with data from Chinese listed companies
Jie Sun ⇑, Ming-yue Jia, Hui Li
School of Economics and Management, Zhejiang Normal University, P.O. Box 62, YingBinDaDao 688, Jinhua 321004, Zhejiang Province, PR China

a r t i c l e i n f o a b s t r a c t

Keywords: Due to the important role of financial distress prediction (FDP) for enterprises, it is crucial to improve the
Financial distress prediction accuracy of FDP model. In recent years, classifier ensemble has shown promising advantage over single
AdaBoost ensemble classifier, but the study on classifier ensemble methods for FDP is still not comprehensive enough and
Single attribute test leaves to be further explored. This paper constructs AdaBoost ensemble respectively with single attribute
Decision tree
test (SAT) and decision tree (DT) for FDP, and empirically compares them with single DT and support vec-
Support vector machine
tor machine (SVM). After designing the framework of AdaBoost ensemble method for FDP, the article
describes AdaBoost algorithm as well as SAT and DT algorithm in detail, which is followed by the com-
bination mechanism of multiple classifiers. On the initial sample of 692 Chinese listed companies and 41
financial ratios, 30 times of holdout experiments are carried out for FDP respectively one year, two years,
and three years in advance. In terms of experimental results, AdaBoost ensemble with SAT outperforms
AdaBoost ensemble with DT, single DT classifier and single SVM classifier. As a conclusion, the choice of
weak learner is crucial to the performance of AdaBoost ensemble, and AdaBoost ensemble with SAT is
more suitable for FDP of Chinese listed companies.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction combination mechanism of multiple classifiers. Section 5 is about


the empirical experiment, in which experimental data is described
In recent years, with the breakout of global financial crisis, and analysis is made according to experimental results. Section 6
many economic entities suffer great loss or even become bankrupt. makes conclusion.
Such shocks impact financial institutions and all kinds of enter-
prises themselves, as well as their related interest parts such as
2. Literature review
stockholders, creditors, auditors, clients, employees, and so on.
For this reason, it is necessary and urgent for enterprises to raise
FDP is a domain which has been studied for many decades. In as
the consciousness of crisis management. As a useful tool for scien-
early as 1930s, Fitzpartrick (1932) made a pioneering attempt to
tific decision making, financial distress prediction (FDP) takes an
compare the financial ratios of successful industrial enterprises
important role in the prevention of corporate failure. From this
with those of failed companies. Beaver (1966) used univariate
point of view, the accuracy of FDP model is of crucial importance,
analysis to predict corporate bankruptcy and found that financial
and many researchers have focused on how to construct efficient
ratios have discriminative ability for corporate failure. From then
FDP models. To improve model accuracy, classifier ensemble is a
on, FDP continued to be widely studied due to its important role
promising method for FDP. However, research on ensemble meth-
of decision aiding. Altman (1968) developed the famous Z-score
ods for FDP just arose recently, and urgently needs to be studied
model by using multiple discriminant analysis (MDA), and it can
comprehensively. This paper further explores AdaBoost ensemble
be utilized to identify companies into two categories by a combina-
for FDP.
tion of five financial ratios. Ohlson (1980) applied logistic
The rest of the paper is divided into five parts. Section 2 makes a
regression (Logit) model to FDP, which is able to indicate the prob-
brief literature review. Section 3 states the contribution of this pa-
ability of financial distress.
per. Section 4 represents the methodology, which includes frame-
FDP entered a booming stage in late 1980s because of the rapid
work of AdaBoost ensemble method for FDP, AdaBoost algorithm,
progress of artificial intelligent and data mining techniques.
the weak learners of single attribute test and decision tree, and
Frydman, Altman, and Kao (1985) carried out FDP through decision
tree (DT), which can draw rules with easiness of understanding.
⇑ Corresponding author. Tel.: +86 13454942829. Odom and Sharda (1990) began to construct neural network (NN)
E-mail addresses: sunjiehit@gmail.com (J. Sun), lihuihit@gmail.com (H. Li). model for FDP. Since then, NN has become one of the most widely

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.01.042
Author's personal copy

9306 J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312

used machine learning methods for FDP, and many studies (Carlos, Alternatively, different training data sets can be produced by
1996; Fletcher & Goss, 1993; Odom & Sharda, 1990; Pendharkar, selecting from the same initial training set according to certain
2005; Zhang, Hu, Patuwo, & Indro, 1999) concluded that it outper- mechanism, and then used to generate diverse classifiers through
forms traditional statistic methods. NN has the advantage of strong the same learning algorithm. Two popular methods for creating
nonlinear mapping ability, but its black-box property makes the such ensembles are bagging and boosting. Alfaro, Gámez, and Gar-
learned knowledge difficult to understand for corporate managers. cía (2007) have shown that AdaBoost, one of the popularly used
Developed on the basis of statistical learning theory, support boosting algorithms, decreases the generalization error and im-
vector machine (SVM) is a relatively new machine learning tech- proves the accuracy in its application to FDP. Alfaro, García, Gámez,
nique (Kim, Kim, & Lee, 2002). SVM was applied to bankruptcy pre- and Elizondo (2008) carried out an empirical comparison for FDP
diction respectively by Shin, Lee, and Kim (2005) and Min and Lee and showed that AdaBoost with DT outperforms NN both in the
(2005) with Korean data and Hui and Sun (2006) with data of cross-validation and test set estimation of the classification error.
Chinese listed companies. They all used radial basis function Kim and Kang (2010) established bagging and AdaBoost ensembles
(RBF) as SVM’s kernel function, and supported the conclusion that with NN and compared them with single NN classifier. It is indi-
SVM outperforms MDA, Logit and NN in FDP. Instead of empirical cated that bagged and boosted NN ensembles consistently improve
risk minimization, SVM uses the principle of structural risk mini- predictive accuracy.
mization, which well prevents SVM from over fitting. Besides, the
problem of local optimization can also be avoided by SVM algo-
3. Contribution of this paper
rithm for the reason that it is a convex optimization problem and
its local optimal solution is just the global optimal solution. Finally,
Former researches on AdaBoost ensemble for FDP used DT or
SVM can also provide good generalization ability and stable classi-
NN as weak learner, and were both compared to single NN classi-
fication performance for relatively small sample. From this point of
fier. As the most simple classification method for FDP, univariate
view, SVM is superior to NN because NN is easy to get into over
analysis, also called single attribute test (SAT) in this paper, needs
fitting when sample is not large enough. Recently, Ding, Song,
lower computational cost than almost all other classification meth-
and Zen (2008) and Boyacioglu, Kara, and Baykan (2009) further
ods. In addition and more importantly, SAT itself has real weak
investigated SVM-based FDP, and affirmed that SVM can serve as
learning ability, which can produce moderate accurate but not so
promising FDP model.
strong classifier. As a universal principle, for two things with the
Considering the possible limitation of single classifier, more and
same function, the simpler is the better. However, to the best of
more researchers began to pay attention to FDP based on multiple
our knowledge, no literatures have provided evidence on whether
classifier combination, or classifier ensemble. It is expected to re-
AdaBoost ensemble with SAT is more suitable for FDP than Ada-
duce the variance of estimated error and improve the whole recog-
Boost ensemble with DT. For this reason, this paper constructs Ada-
nition performance (Kim, Min, & Han, 2006; Kim et al., 2002; Ruta
Boost ensemble respectively with SAT and DT for FDP, and
& Gabrys, 2005). To construct an effective multiple classifier sys-
empirically compares them with single DT and SVM classifier.
tem, diversity is essential, which means the base classifiers to be
The reason why SVM, instead of NN, is chosen for the purpose of
combined should be different. By now, several methods to produce
comparison is that SVM has been proved to be a prominent single
such diversity have been proposed.
classifier for FDP by many literatures, and it is superior to NN in
Firstly, diverse base classifiers can be generated by applying dif-
terms of generalization ability especially when the sample is not
ferent learning algorithms (with heterogeneous model representa-
so large. Therefore, this study contributes to provide further insight
tions) to a single data set. Jo and Han (1996) integrated CBR, NN,
into FDP method based on AdaBoost ensemble, particularly for the
and MDA to predict bankruptcy, and concluded that the combined
situation that FDP needs to be made with limited samples. It is also
model is superior to each independent one. Sun and Li (2008) pro-
believed that such empirical results can provide useful guideline
posed FDP method by weighted majority voting combination of
for the practice of FDP.
MDA, Logit, DT, NN, SVM, and CBR, and Cho, Kim, and Bae (2009)
introduced an integration strategy with subject weight based on
NN to combine MDA, Logit, NN and DT for bankruptcy prediction. 4. Methodology
They concluded that FDP based on combination of multiple classifi-
ers is superior to single classifier according to accuracy rate or stabil- 4.1. Framework of AdaBoost ensemble method for FDP
ity to some extent. Li and Sun (2009) put forward a multiple CBR
system by majority voting, which inherits the ability of producing As a commonly used technique for constructing ensemble clas-
maximum accuracy generated by its component, improves the abil- sifiers, Boosting tries to construct a classifier ensemble by develop-
ity of producing minimum accuracy, and achieves more ability on ing one classifier at a time incrementally. This means each
stability. Sun and Li (2009) and Hung and Chen (2009) respectively classifier that joints the ensemble is trained on a data set selec-
studied the FDP ensemble method by classifier selection with differ- tively sampled from a training data set by gradually increasing
ent inner structures. The former called it serial combination of mul- the likelihood of ‘‘difficult’’ data points at each step. AdaBoost, pro-
tiple classifiers and compared it with candidate single classifiers. It posed by Freund and Schapire (1997), is the most well known
was concluded that for the FDP problem with two categories serial boosting method. This paper tries to study on AdaBoost ensemble
combination does not show much superiority to the best base clas- method for FDP, whose framework is designed as Fig. 1. It firstly
sifier. The latter called it selective ensemble and compared it with samples a training set from the initial data set according to uni-
stacking ensemble by voting and weighting, to conclude that selec- form distribution (W1), and then adaptively adjusts each example’s
tive ensemble performs better than stacking ensemble. weight in terms of whether it is difficult or easy to classify. Namely,
Besides, Tsai and Wu (2008) used NN ensemble for bankruptcy the initial training examples which are misclassified by the weak
prediction, whose diversified base classifiers were constructed on learner trained in the nearest last step are regarded as the difficult
different data sets from three countries. Their experimental results ones, and their weights should be increased. The updated weight
showed that NN ensemble did not outperform a single best NN distribution is then used to sample another training set from the
classifier, based on which they considered that the proposed multi- initial data set, on which another weak learner can be trained. After
ple classifier system may be not suitable for the binary classifica- T times of such iterations, AdaBoost ensemble for FDP can be com-
tion problem as bankruptcy prediction. posed of T weak learners, whose individual outputs are combined
Author's personal copy

J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312 9307

Table 1
Initial training dataset Sn AdaBoost algorithm.

AdaBoost algorithm
Weight distribution W1 Weight distribution W2 … Weight distribution WT
Input: Initial training set composed of n examples, denoted as
Sn = {(x1, y1), (x2, y2), . . . , (xn, yn)}
Training dataset 1 Training dataset 2 Training dataset T Weak learning algorithm, denoted as WeakLearner
selected from Sn selected from Sn selected from Sn Integer T specifying total number of iterations
according to W1 according to W2 according to WT Initialize: wi1 ¼ 1=n, i.e. W 1 ¼ fw11 ; w21 ; . . . ; wn1 g ¼ f1=n; 1=n; . . . ; 1=ng
The ensemble F = /
For t = 1, 2, . . . , T
Weak learner 1 Weak learner 2 Weak learner T 1. Take a sample Rt from Sn using distribution Wt
2. Build a classifier ft using Rt as the trainingset 
P
3. Compute: Et ¼ i:ft ðxi Þ–yi wit and at ¼ 0:5 ln 1E Et
t

i
AdaBoost ensemble 4. Update the weight:witþ1 ¼ normalizeðwit  expðat  lt ÞÞ
Weak learner 1
Output: The ensemble F = {f1, f2, . . . , fT} and A = {a1, a2, . . . , aT}
Testing Weak learner 2 Combination Prediction
dataset … result

Weak learner T
4.3. Weak learners
Fig. 1. Framework of AdaBoost ensemble method for FDP.
AdaBoost is a method used to significantly reduce the error of
to produce the final prediction result. The details of AdaBoost algo- weak learning algorithm. In theory, the weak learning algorithm
rithm, weak learning algorithm and combination mechanism are to can be any one as long as it can generate classifiers which need
be stated respectively in the following subsections. only be a little better than random guessing (Freund & Schapire,
1996). That means the weak learners should not result in over-
4.2. AdaBoost fitting. In this paper, SAT and DT are respectively used as weak
learning algorithms for the following two reasons. Firstly, SAT
Suppose Sn = {(x1, y1), (x2, y2), . . . , (xn, yn)} is a set of training and DT are both non-parametric learning algorithms which need
samples, and yi (i = 1, 2, . . . , n) e {1, 1}, which represents only not search optimal parameters in the training stage, and thus they
two classes for simplification purpose. The weight distribution have relatively faster learning ability than other parametric algo-
over these samples at the tth boosting iteration is denoted as rithms. This property makes them especially suitable for AdaBoost
W t ¼ fw1t ; w2t ; . . . ; wnt g (t = 1, 2, . . . , T), which is initially set uni- ensemble, which needs time-consuming iterations repeated for
formly. It means the weight wit (i = 1, 2, . . . , n) is given a value of many times. Secondly, AdaBoost prefers weak learning algorithms
1/n at the first iteration when t = 1, and will be updated adaptively over strong ones. Namely, AdaBoost can only provide very limited
at later iterations. At iteration t, AdaBoost builds a new training improvement in accuracy for a strong learning algorithm. Though
data set by sampling from the initial training data set with the Alfaro et al. (2008) used DT with deep pruning as weak learner
weight distribution of Wt, and calls the Weak Leaner to construct for FDP AdaBoost ensemble, this paper also attempts the FDP Ada-
a base classifier, represented as ft, on this new training data set. ft Boost ensemble with SAT weak learner, because SAT is surpassed
should then be applied to classifying the samples in the initial data by all other FDP methods in former researches and usually obtains
set, and the error of ft, denoted as Et, can be calculated as follows: moderately but not so accurate learner, which just meets the
X requirement of AdaBoost.
Et ¼ wit ð1Þ
i:ft ðxi Þ–yi
4.3.1. Single attribute test
According to the idea that easy samples correctly classified by ft SAT is the first algorithm proposed for diagnosing corporate
get lower weights and difficult samples misclassified get higher financial distress, and it was named as univariate discriminant
weights, samples’ weight distribution should be updated as analysis by Beaver (1966). Suppose X is a matrix composed of m
follows: rows and n columns. Here, m is the number of all attributes and
n is also the number of training samples. The SAT algorithm ap-
i
w0itþ1 ¼ wit  expðat  lt Þ ði ¼ 1; 2; . . . ; nÞ ð2Þ plied in this study is listed in Table 2.
i
In the above formula (2), at and lt is calculated as follows:
  4.3.2. Decision tree
1  Et DT began to be applied to financial distress prediction by
at ¼ 0:5 ln ð3Þ
Et Frydman et al. (1985). It is a kind of tree-shaped decision struc-
 ture learned inductively from sample data whose class is already
i 1 if f t ðxi Þ ¼ yi known by recursively partitioning attribute values. In DT, each
lt ¼ ð4Þ
1 if f t ðxi Þ – yi non-leaf node means a testing of an attribute value, and each
leaf node represents a class. Thus, DT can provide well under-
The above calculated weights should be normalized so that they standable knowledge and aid decision making for less-experi-
add up to one enced users easily. Basic algorithm of DT is stated in Table 3(
w0i Sun & Li, 2008).
witþ1 ¼ Pn tþ10i ði ¼ 1; 2; . . . ; nÞ ð5Þ In the basic DT algorithm described in Table 3, IG represents
i¼1 wtþ1
information gain, which is most widely used as the criterion for
When T iterations are processed, the ensemble will be com- choosing an attribute split. Its calculation is as the follows.
posed of T weak classifiers. The final AdaBoost classification result Sn is a data set consisting of n samples. Label of a sample has
is made through combination of their classification results two different values, namely: C1 = 1 and C2 = 1. If sl (l = 1, 2) is
weighted by at. In detail, the AdaBoost algorithm is listed in the sample number of class Cl. Then the total information entropy
Table 1. needed to classify the given data set is I(s1, s2)
Author's personal copy

9308 J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312

Table 2 In this way, the information entropy gained by attribute A is


Single attribute test algorithm. IG(A).
Single attribute test algorithm
IGðAÞ ¼ Iðs1 ; s2 Þ  EðAÞ ð10Þ
Input: Matrix composed of different attributes values of all training samples,
denoted as Xmn. The decision tree obtained by the above basic algorithm is often
Label values set of all training samples, denoted as Yn. a big one and easy to get into over-fitting, and a pruning technique
1. For each attribute k 2 [1, 2, . . . , m], do the following: is a useful way to avoid over-fitting.
(1) Let Fn represent the kth row in the matrix Xmn
(2) Let Z = unique_ascending(Fn)
(3) Let Size_Z = length(Z)  1 4.4. Combination mechanism
(4) For each j 2 [1, 2, . . . , Size_Z], do the following steps:
a) Let fj = [Zj + Z(j+1)]/2
The output of AdaBoost algorithm in Table 1 is the ensemble
b) Compute the classification error, denoted as errj, on the
assumption that fj is the partitioning point for attribute k
composed of T base classifiers that is denoted as F = {f1, f2, . . . , fT},
and their corresponding combination weight set, which is denoted
(5) Let Errk represents the minimal value in the err = [er- as A = {a1, a2, . . . , aT}. The combination weight at is calculated from
Size Z
r1, err2, . . . , errSize_Z,], i.e. Err k ¼ minj¼1 ðerr j Þ and PPk represents the the error of base classifier ft according to formula (3), in which, the
best partitioning point for attribute k corresponding to Errk, i.e.
lower Et (error of base classifier ft) is, the higher value at (weight of
PPk = f(j⁄), in which j⁄ represents the position index in array err cor-
responding to the minimal value base classifier ft) gets. As the most widely used combination mech-
m
2. Let b ¼ PP k if Errk ¼ mink¼1 ðErr k Þ. It means attribute k⁄ is selected to anism, majority voting scheme is adopted to combine the predic-
construct the single attribute model for classification tion results of T base classifiers. In detail, weighted majority
3. If attribute k⁄ is a positive indicator, then the single attribute discriminant voting, instead of simple majority voting, is applied to consider
 
model is represented as y ¼ sgnðxk  bÞ. Else, y ¼ sgnðxk þ bÞ
Output: The single attribute discriminant model
the different role each base classifier takes in the combination pro-
cess to produce the final result. For FDP problem with two catego-
ries denoted NM (normal) and FD (financial distress), given an
Table 3
example xi to be predicted by the ensemble, the combination
Basic algorithm of decision tree. mechanism of weighted majority voting can be described as
follows:
Basis algorithm of decision tree

Input: Training sample data (on the assumption that all attributes should be NM if PðFðxi Þ ¼ NMÞ > PðFðxi Þ ¼ FDÞ
discretized).
Fðxi Þ ¼ ð11Þ
FDif PðFðxi Þ ¼ NMÞ 6 PðFðxi Þ ¼ FDÞ
Candidate attribute set, denoted as attribute_list.
Algorithm: Gen_decision_tree X
1. Create a node, which can be denoted as N; PðFðxi Þ ¼ NMÞ ¼ at ð12Þ
2. If all samples of the node belong to the same class, then return N as a leaf t:ft ðxi Þ¼NM
node which is denoted as class C;
3. If is_empty(attribute_list) then return N as a leaf node which is denoted as X
the class having the most samples in node N; PðFðxi Þ ¼ FDÞ ¼ at ð13Þ
4. Choose the attribute whose IG value is the biggest in attribute_list, and t:ft ðxi Þ¼FD
then denote it as test_attribute;
5. Sign the node N with test_attribute;
6. According to the condition of test_attribute=ai, produce a branch from the 5. Empirical experiment
node N, where Si is a sample set that meets the condition;
7. If is_empty(Si) Then denote the corresponding leaf as the class that has
the largest number of samples in node N 5.1. Data description
Else express the corresponding leaf node as the class that is iteratively
returned by Gen_decision_tree(Si, attribute_list - test_attribute). Financially distressed company is defined as the one who has
Output: decision tree. had negative net profit in consecutive two years, or its net capital
per share is lower than the face value per share for the reason of
X
2 one year’s substantive loss. They are also the main criteria for spe-
Iðs1 ; s2 Þ ¼  pl log2 ðpl Þ ð6Þ cially treating Chinese listed companies by Chinese Stock Exchange
l¼1 due to abnormal financial status (Sun & Li, 2011).1 Healthy compa-
Where, pl expressses the probability that each random sample be- nies are chosen from those that have never been specially treated by
longs to the class Cl. Its value can be calculated by pl = sl/s. the matching method considering both industry and asset size.
Assume attribute A has v different values {a1, a2, . . . , av}. The According to the information of Chinese listed companies from
data set S can be divided into v subsets, namely:{S1, S1, . . . , Sv}. 2000 to 2008, the initial samples consist of totally 692 companies
The subset Sq (q = 1, 2, . . . , v) consists of samples whose attribute listed in Shenzhen Stock Exchange and Shanghai Stock Exchange.
values are equal to aq. Assume slq denotes the number of samples Therefore, the experimental data are collected from Chinese listed
who belong to both subset Sq and class Cl. The information entropy companies on real world information, and our study focuses on
which is used to classify the given data set according to the attri- financial distress from the prospect of profitability.
bute can be calculated by the following formula, namely: E(A) Suppose the year of financial distress is represented as year
(t  0). Similarly, year (t  1), (t  2) and (t  3) respectively mean
Xv
s1q þ s2q one year, two years and three years before the financial distress
EðAÞ ¼ Iðs1q ; s2q Þ ð7Þ
q¼1
s year. For comprehensive generalization, our study attempts to

1
X
2 Commonly, Chinese listed companies will be specially treated (ST) if: (1) a
Iðs1q ; s2q Þ ¼  plq log2 ðplq Þ ð8Þ company has had negative net profit in consecutive two years or (2) a firm’s net
l¼1 capital per share is lower than its face value. This study chooses samples according to
the above ST criteria. If a company is specially treated because (1) the firm purposely
slq publishes financial statements with serious false and misstatement or (2) other
plq ¼ ðl ¼ 1; 2Þ ð9Þ abnormal incidents described in Chinese Stock Listing Exchange Rule appear, it is
s1q þ s2q
excluded.
Author's personal copy

J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312 9309

predict financial distress of year (t  0) respectively according to 5.3. Experimental results and analysis
financial ratio information of year (t  1), (t  2) and (t  3). That
is, financial distress is tried to be predicted respectively one year Thirty times of holdout testing errors for FDP are listed in Ta-
in advance, two years in advance and three years in advance. ble 5. For direct comparison among different FDP methods, the
Forty one financial ratios are utilized as input variables, which mean values of 30 times of holdout testing errors are also calcu-
cover profitability, activity, solvency, growth, risk level, per share lated in the last row of Table 5. As can be seen, no matter FDP is
ratios, and cash flow ratios, as listed in Table 4 (Sun and Li, made one year, two years or three years in advance, SA (AdaBoost
2011). Hence, these explanatory variables can provide comprehen- ensemble with single attribute test) outperforms the other three
sive indication of firm’s financial and operational state. Since SAT methods in terms of mean testing error. That is, in the 30 times
and DT are applied as weak learners in the study, no other feature of holdout testing, SA has the lowest mean testing errors of
selection methods are needed for the reason that these weak lear- 2.78%, 12.81% and 27.51% respectively at year (t  1), (t  2) and
ner algorithms have the ability of feature selection by themselves. (t  3). However, DTA does not always obtain the lower mean test-
ing errors than the single classifiers of DT and SVM from year
5.2. Experimental design (t  1) to (t  3). In detail, the mean testing error of DTA at year
(t  2) is 13.04%, which is both lower than DT (14.26%) and SVM
To obtain comparable experimental results, the same FDP prob- (13.43%), but the mean testing error of DTA is higher than DT at
lem is solved by four different classification methods, i.e. AdaBoost year (t  1) and higher than SVM at year (t  3). For the two single
ensemble with SAT (represented as SA), AdaBoost ensemble with classifiers, SVM evidently outperforms DT at year (t  2) and
DT (represented as DTA), single classifier on DT and single classifier (t  3), but performs a little worse than DT at year (t  1). The pos-
on SVM. Thirty times of holdout tests are carried out to estimate sible reason for this phenomenon is that more linearity may exist
the prediction accuracy more objectively. Each time, the total ini- between explanatory variables and output label at year (t  1) than
tial sample composed of 692 Chinese listed companies are divided year (t  2) and (t  3). While, the RBF SVM applied in the experi-
into two subsets, namely training set and testing set. The former ment is more suitable for non-linear problems.
has the proportion of two thirds (2/3) and the latter occupies the For clearer illustration, testing error curves on 30 times of hold-
rest one third (1/3). For each classification method, the above de- out experiments are graphed in Figs. 2–4 respectively at year
scribed process of dividing training set and testing set are repeated (t  1), (t  2) and (t  3).
for 30 times, so that 30 estimated errors can be ultimately obtained As can be seen, the testing error curves of SA are at a relatively
for the purpose of statistical analysis. This makes the comparison lower position for all three years. Furthermore, the testing error
among different classification methods more scientific. curves of SA wave in smaller ranges than those of DTA, DT and
Individual DT classifier is pruned using the level of depth with SVM, indicating that the FDP method on SA is more stable than
which the pruned tree has the lowest 10-fold cross validation the other three ones. Such an advantage of SA over the other three
error. Therefore, the size of individual tree is limited to avoid methods is obvious according to the shape of testing error curve.
over-fitting. SVM algorithm with RBF kernel function is applied When FDP is made at year (t  1), RBF SVM tends to have the high-
to building single SVM classifier, since RBF SVM is proved to be est testing error and largest waving range for the possible reason
an effective one for FDP in former researches (Hui & Sun, 2006; mentioned above. But when FDP is made at year (t  2) or
Min & Lee, 2005; Shin et al., 2005). Because the tuning parameter (t  3), the points on DT’s testing error curve tend to be on the
C and the kernel parameter c are crucial to RBF SVM’s classification higher position more frequently. Therefore, FDP methods based
performance, grid search technique is used to find optimal param- on AdaBoost ensemble is superior to the single classifier of DT or
eter values by the criteria of 10-fold cross validation error. SVM as a whole.

Table 4
Financial ratios used as explanatory variables.

Category Variables
Profitability Gross income/operating revenue Net profit/operating revenue
Earning before interest and tax/total assets Net profit/total assets
Net profit/current assets Net profit/fixed assets
Profit margin Net profit/equity
Return on invested capital
Activity Account receivables turnover Inventory turnover
Account payable turnover Working capital turnover
Current assets turnover Fixed assets turnover
Long-term assets turnover Total assets turnover
Net assets turnover
Solvency Current ratio Quick ratio
Working capital ratio Asset-liability ratio Equity/debt ratio
Current assets/total assets Fixed assets/total assets
Equity/fixed assets Current liability/total liabilities
Debt/tangible assets ratio Liabilities/market value of equity
Growth ratios Growth rate of prime operating revenue Rate of capital preservation and appreciation
Growth rate of total assets Growth rate of net profit
Risk level Coefficient of financial leverage Coefficient of operating leverage
Per share ratios Operating revenue per share Earning per share
Net assets per share
Cash flow ratios Cash flow/current liabilities ratio Cash rate of prime operating revenue
Net operating cash flow per share Net cash flow per share
Net operating cash flow/net profit ratio
Author's personal copy

9310 J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312

Table 5
Holdout testing errors for FDP.

id t1 t2 t3


SA (%) DTA (%) DT (%) SVM (%) SA (%) DTA (%) DT (%) SVM (%) SA (%) DTA (%) DT (%) SVM (%)
1 3.48 5.65 5.22 6.96 10.00 12.17 14.78 12.61 23.36 22.90 23.36 23.83
2 2.17 2.61 2.17 2.61 10.00 13.04 12.61 12.61 29.44 34.11 29.44 30.37
3 3.91 4.35 5.22 5.65 11.74 16.09 14.35 14.78 32.71 30.84 28.50 28.97
4 3.04 4.78 4.78 8.26 14.78 10.87 13.91 15.65 26.17 28.50 33.64 25.23
5 1.74 3.91 2.61 2.61 13.91 13.04 14.35 15.22 28.97 30.84 30.84 29.91
6 1.74 2.17 1.74 2.61 11.30 12.61 15.22 15.65 24.77 26.17 27.10 23.83
7 1.74 1.74 2.61 2.61 12.61 14.35 13.91 11.74 28.50 28.50 28.50 27.57
8 3.04 3.91 3.91 3.48 13.48 11.30 12.17 13.04 26.64 29.91 29.91 31.78
9 1.30 2.61 2.17 2.17 14.78 13.04 13.91 13.91 27.57 33.18 30.37 31.31
10 1.74 2.17 1.74 2.17 9.57 15.65 12.61 14.78 29.91 28.97 30.37 30.37
11 3.04 3.48 3.48 3.91 13.48 16.52 14.78 15.22 25.23 33.18 28.50 27.57
12 3.48 5.22 4.78 5.22 13.04 11.30 14.35 12.61 29.91 29.91 29.91 32.24
13 3.48 3.48 3.48 4.35 11.30 13.04 14.35 11.74 28.50 28.97 30.37 29.91
14 2.61 3.04 3.48 3.04 14.78 13.91 15.65 12.17 28.97 29.44 31.78 28.97
15 3.91 5.65 3.48 5.65 14.35 12.17 15.22 12.61 32.24 29.44 31.78 29.91
16 2.61 5.22 5.65 3.91 13.48 12.61 16.96 13.48 24.30 23.83 25.70 30.84
17 3.48 3.04 3.91 7.39 9.57 8.26 14.35 11.30 30.37 28.50 28.50 27.10
18 1.74 2.61 3.04 3.04 15.22 13.91 16.09 11.74 29.44 36.92 27.57 28.04
19 3.91 4.35 4.78 4.35 14.78 13.48 15.22 16.96 26.17 28.97 24.30 25.23
20 2.61 3.04 4.35 3.48 12.61 10.43 12.61 12.17 29.44 29.91 28.50 29.91
21 2.61 3.48 2.61 3.91 15.65 12.61 15.22 15.22 19.63 22.90 21.96 23.83
22 3.04 5.22 5.22 3.48 13.04 9.57 12.61 11.30 25.70 24.77 24.30 26.17
23 2.61 3.48 3.04 3.04 12.61 13.48 14.78 14.35 27.57 25.70 35.98 28.04
24 1.74 0.87 3.04 3.04 11.30 12.61 11.74 12.17 25.23 26.64 26.64 26.64
25 3.91 3.48 3.48 3.48 12.17 13.48 14.35 12.61 29.44 29.44 30.37 26.17
26 3.04 2.61 3.04 3.04 12.61 12.61 10.87 11.30 26.64 26.17 26.64 28.97
27 2.61 3.48 2.61 5.22 12.17 13.04 14.78 13.48 28.04 28.04 32.24 28.04
28 3.91 3.04 2.61 2.61 16.52 16.09 16.52 16.09 26.17 30.84 28.04 27.10
29 3.91 4.35 3.48 3.91 13.48 16.52 17.83 13.91 27.57 30.37 35.51 29.44
30 1.30 3.04 2.17 3.04 10.00 13.48 11.74 12.61 26.64 27.10 31.31 27.10
Mean 2.78 3.54 3.46 3.94 12.81 13.04 14.26 13.43 27.51 28.83 29.07 28.15

9 39
PA
8 DTA 37
DT 35
Holdout testing error (%)

Holdout testing error (%)

7 SVM 33
6 31
5 29
27
4
25
3 23
2 21 PA
DTA
19
1 DT
17 SVM
0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 15
The number of holdout testing 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
The number of holdout testing
Fig. 2. Testing error curve on 30 times of holdout experiments at year (t  1).
Fig. 4. Testing error curve on 30 times of holdout experiments at year (t  3).

20
For convincing support of the above comparison, the statistical
19 analysis of left-tailed T test for mean comparison is carried out and
the results corresponding to year (t  1), (t  2) and (t  3) are
Holdout testing error (%)

17 respectively listed in Tables 6–8.


15
From Table 6 at year (t  1), the testing error of SA is lower than
DTA, DT, and SVM all at the significance level of 1%. DTA is more
13 accurate than SVM at the significance level of 5%, but there exists
no significant statistical difference between DTA and DT. For the
11
possible reason mentioned above, DT performs better than SVM
9 PA at the significance level of 5% at year (t  1).
DTA From Table 7 at year (t  2), SA has lower testing error than DT
7 DT
and SVM respectively at the significance level of 1% and 5%. Though
SVM
5
the mean testing error of SA is lower than DTA as shown in Table 5,
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 it is not statistically significant. DTA is more accurate than DT at
The number of holdout testin
the significance level of 1%. However, there exists no significant
Fig. 3. Testing error curve on 30 times of holdout experiments at year (t  2). statistical difference between DTA and SVM according to the result
Author's personal copy

J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312 9311

Table 6 play well. Hence, the choice of weak learner is crucial to the perfor-
Left-tailed T testing results at year (t  1). mance AdaBoost ensemble. For FDP, AdaBoost ensemble with SAT
t1 is a better choice because of its acceptable prediction accuracy as
SA DTA DT SVM well as the relatively low computational cost.
SA – 4.419 4.095 4.910
0.000*** 0.000*** 0.000*** 6. Conclusion
DTA 4.419 – 0.473 1.742
1.000 0.680 0.046**
DT 4.095 0.473 – 2.118 FDP takes an important role in the prevention of corporate fail-
1.000 0.320 0.021** ure, which makes the accuracy of FDP model be widely concerned
SVM 4.910 1.742 2.118 – by FDP researches. Though former researches have made compre-
1.000 0.954 0.979
hensive investigation on different single classifiers for FDP, FDP
**
Significance level of 5%. based on classifier ensemble just arose in recent years and has a
***
Significance level of 1%. good prospect of application. This paper further explores AdaBoost
ensemble for FDP and makes an empirical comparison. After
designing the framework of AdaBoost ensemble method for FDP,
Table 7 it describes the algorithms of AdaBoost as well as SAT and DT,
Left-tailed T testing results at year (t2). and uses weighted majority voting as the combination mechanism.
On the sample from Chinese listed companies, 30 times of holdout
t2
experiments are carried out respectively for the four FDP methods
SA DTA DT SVM
of AdaBoost ensemble with SAT, AdaBoost ensemble with DT, sin-
SA – 0.515 4.385 1.733 gle DT and single SVM. Experimental results show that AdaBoost
0.305 0.000*** 0.047**
ensemble with SAT outperforms the other three methods with sta-
DTA 0.515 – 3.288 1.127
0.695 0.001*** 0.135 tistical significance and especially suits for Chinese listed compa-
DT 4.385 3.288 – 2.510 nies FDP. It is also confirmed that choice of weak learner
1.000 0.999 0.991 algorithm does affect the FDP performance of AdaBoost ensemble,
SVM 1.733 1.127 2.510 – because DT weak learner is inferior to SAT weak learner in our
0.953 0.865 0.009***
experiments with Chinese listed companies. FDP experiments are
**
Significance level of 5%. carried out respectively one year, two years and three years in ad-
***
Significance level of 1%. vance, which make the above conclusion more comprehensive.
Therefore, this study contributes to provide incremental evidence
for FDP research based on AdaBoost and guide the real world prac-
Table 8 tice of FDP to some extent. However, this study also has the limi-
Left-tailed T testing results at year (t  3). tation that the experimental data sets are only collected from
t3 Chinese listed companies, and further investigation can be done
SA DTA DT SVM
based on other countries’ real world data sets in future study.

SA – 2.675 2.898 1.477


0.006*** 0.004*** 0.075* Acknowledgments
DTA 2.675  0.349 1.266
0.994 0.365 0.892
This research is supported by the National Natural Science
DT 2.898 0.349 – 1.622
0.996 0.635 0.942 Foundation of China (Grant No. 70801054) and the Zhejiang Pro-
SVM 1.477 1.266 1.622 – vincial Natural Science Foundation of China (Grant No. Y6090392).
0.925 0.108 0.058*
*
Significance level of 10%. References
***
Significance level of 1%
Alfaro, E., Gámez, M., & García, N. (2007). Multiclass corporate failure prediction by
AdaBoost. M1.. Advanced Economic Research, 13, 301–312.
of one-tailed T testing, although Table 5 indicates that the mean Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An
empirical comparison of AdaBoost and neural networks. Decision Support
testing error of DTA is lower than SVM at year (t  2). With more Systems, 45, 110–122.
non-linearity in the dataset at year (t  2), SVM predicts financial Altman, E. I. (1968). Financial ratios discriminant analysis and the prediction of
distress well than DT at the significance level of 1%. corporate bankruptcy. Journal of Finance, 23, 589–609.
Beaver, W. (1966). Financial ratios as predictors of failure. Journal of Accounting
From Table 8 at year (t  3), SA is superior to DTA and DT both Research, 4, 71–111.
at the significance level of 1% and to SVM at the significance level Boyacioglu, M., Kara, Y., & Baykan, O. (2009). Predicting bank financial failures using
of 10%. Though the mean testing error of DTA is lower than DT and neural networks, support vector machines and multivariate statistical methods:
A comparative analysis in the sample of savings deposit insurance fund
higher than SVM in Table 5, such differences are not significant transferred banks in Turkey. Expert Systems with Applications, 36, 3355–3366.
from statistical view. Similar to year (t  2), SVM is more suitable Carlos, S.-C. (1996). Self organizing neural networks for financial diagnosis. Decision
for FDP than DT at year (t  3), and the significance level is 10%. Support Systems, 17, 227–238.
Cho, S., Kim, J., & Bae, J. K. (2009). An integrative model with subject weight based
Above all, when SAT is utilized as the weak learner, AdaBoost on neural network learning for bankruptcy prediction. Expert Systems with
ensemble always has relatively lower error and is particularly suit- Applications, 36, 403–410.
able for the problem of FDP. When DT is used as the weak learner, Ding, Y., Song, X., & Zen, Y. (2008). Forecasting financial condition of Chinese listed
companies based on support vector machine. Expert Systems with Applications,
the advantage of AdaBoost ensemble over single classifier is not
34, 3081–3089.
stable enough. Therefore, AdaBoost ensemble with SAT outper- Fitzpartrick, P. (1932). A comparison of ratios of successful industrial enterprises
forms AdaBoost ensemble with DT in application to FDP according with those of failed companies. Certified Public Accountant, 10, 598–605. 11,
to our experimental results. This may be attributed to the reason 656–662; 12, 727–731.
Frydman, H., Altman, E. I., & Kao, D.-L. (1985). Introducing recursive partitioning for
that the DT classifier trained in the experiment is not weak enough, financial classification: The case of financial distress. Journal of Finance, 40(1),
so that the advantage of AdaBoost algorithm is not brought into 269–291.
Author's personal copy

9312 J. Sun et al. / Expert Systems with Applications 38 (2011) 9305–9312

Fletcher, D., & Goss, E. (1993). Forecasting with neural networks: An application Ruta, D., & Gabrys, B. (2005). Classifier selection for majority voting. Information
using bankruptcy data. Information and Management, 24, 159–167. Fusion, 6, 63–81.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Odom, M., & Sharda, R. (1990). A neural networks model for bankruptcy prediction.
Proceedings of the 13th international conference on machine learning (pp. 148– Proceedings of the IEEE International Conference on Neural Network, 2, 163–168.
156). San Francisco: Morgan Kaufmann. Ohlson, J. (1980). Financial ratios and probabilistic prediction of bankruptcy. Journal
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line of Accounting Research, 18, 109–131.
learning and an application to boosting. Journal of Computer and System Sciences, Pendharkar, P. C. (2005). A threshold varying artificial neural network approach for
55(1), 119–139. classification and its application to bankruptcy prediction problem. Computers
Hung, C., & Chen, J. (2009). A selective ensemble based on expected probabilities for & Operations Research, 32, 2561–2582.
bankruptcy prediction. Expert Systems with Applications, 36, 5297–5303. Sun, J., & Li, H. (2008). Listed companies’ financial distress prediction based on
Hui, X.-F., & Sun, J. (2006). An application of support vector machine to companies’ weighted majority voting combination of multiple classifiers. Expert Systems
financial distress prediction. Lecture Notes in Artificial Intelligence, 3885, with Applications, 35, 818–827.
274–282. Sun, J., & Li, H. (2008). Data mining method for listed companies’ financial distress
Jo, H., & Han, I. (1996). Integration of case-based forecasting, neural network, and prediction. Knowledge-Based Systems, 21(1), 1–5.
discriminant analysis for bankruptcy prediction. Expert Systems with Sun, J., & Li, H. (2009). Financial distress prediction based on serial combination of
Applications, 11(4), 415–422. multiple classifiers. Expert Systems with Applications, 36, 8659–8666.
Kim, E., Kim, W., & Lee, Y. (2002). Combination of multiple classifiers for the Sun, J., & Li, H. (2011). Dynamic financial distress prediction using instance selection
customer’s purchase behavior prediction. Decision Support Systems, 34, for the disposal of concept drift. Expert Systems with Applications, 38,
167–175. 2566–2576.
Kim, M.-J., Min, S.-H., & Han, I. (2006). An evolutionary approach to the combination Shin, K.-S., Lee, T. S., & Kim, H.-J. (2005). An application of support vector machines
of multiple classifiers to predict a stock price index. Expert Systems with in bankruptcy prediction model. Expert Systems with Applications, 28(1),
Applications, 37, 241–247. 127–135.
Kim, M.-J., & Kang, D.-K. (2010). Ensemble with neural networks for bankruptcy Tsai, C.-F., & Wu, J.-W. (2008). Using neural network ensembles for bankruptcy
prediction. Expert Systems with Applications, 31(4), 3373–3379. prediction and credit scoring. Expert Systems with Applications, 34, 2639–2649.
Li, H., & Sun, J. (2009). Majority voting combination of multiple case-based Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural networks in
reasoning for financial distress prediction. Expert Systems with Applications, 36, bankruptcy prediction: General framework and cross-validation analysis.
4363–4373. European Journal of Operational Research, 116, 16–32.
Min, J. H., & Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine
with optimal choice of kernel function parameters. Expert Systems with
Applications, 28(4), 128–134.