Data Mining For Financial Statement Analysis

An Analysis Model of Financial Statements
Based on Data Mining

Li Yanhong, Liu Peng, Qin Zheng
with the School of Information Management &Engineering, Shanghai University of Finance & Economics Shanghai 200433 P.R.China (email: {lyhong, liupeng, qinzheng} @mail.shufe.edu.cn )
Abstract-The Paper built an analysis model of financial
statements based on data mining methods, that is making data mining methods such as clustering, association rules and decision making tree work together to step by step go into deeper analysis of existing financial statements, during which a annual assets structure statement is worked out. The data used for research is from financial statements of electronic product corporations published on Internet. The paper established and implemented an integrated data mining model for the electronic product industry. Finally, some meaningful conclusions were drawn, which is great benefit to decision makers and investors in this industry to analyze financial situations of some corporate and make better investment decisions, budget or management plans.
making tree, Data mining, Financial analysis
Index Terms-Association rules, Clustering, Decision
technical core is the concept of dimension. We can also think OLAP of collection of multiple dimension data analysis tools. DM emphasizes the deepness and automation level of data analysis, the first generation research focused on the categories of mining schema and algorithm efficiency, which achieved great progress, but algorithms have to be faced with overlarge searching space, which results in the occurrence of lots of useless schemas. Aiming to the situation, in order to benefit each other, literature [9] presented OLAP Mining, shortened to OLAM, which means do data mining job based different abstract levels on databases or data warehouses, thus the searching space can been limited or controlled, which is also convenient for users to interactive choose data mining functions, at the same time, the data mining results can be verified by OLAP tool, so the efficiency, flexibility and intelligence of data analysis can been improved and enhanced.
I. INTRODUCTION
A. Date mining and OLAP Data mining is a new type cross-disciplinary field originated from the end of twentieth century, which has been used in many areas such as customer relationship management, investment decision, banking risk evaluation, stock price analysis in financial market and so on, but seldom used in financial analysis. The paper would like to use the common data mining methods to analyze financial data from financial statements, build practical financial statements analysis model, to assist corporate financial analyst, investors and decision maker in knowing financial situations of some corporate or industry better and making all kinds of right decisions. Data Mining, shortened to DM, has wide and narrow concepts. The wide means the process of finding hidden, unknown and useful knowledge or information from large data sets. The narrow concept of data mining is one critical step of knowledge discovery, an important procedure of drawing useful schemas or building model[l]. The mail functions of data mining lie in classification, clustering, association analysis, sequence discrimination, prediction, forecasting and so on. OLAP(On Line Analysis Processing) meets decision making support or query and report requirements in multiple dimension circumstance, whose
B. The existing financial analysis methods Financial analysis is to find out the economy meaning of accounting data in order to understand the running performance and financial position of one company, which helps investors and creditors with their decision making. Accounting system just reflects economic activities selectively, which confirms one activity relative late, at the same time accounting rules are not so much perfect, in addition manager have freedom to choose accounting methods, the above reasons make many unfit exist inevitably in financial reports. Although audit can improve the above situation in some measure, auditor can not assure the facticity and correctness of financial statements absolutely, which make financial analysis more important. The methods of financial analysis commonly are comparative analysis, trend analysis, factor analysis and ratio analysis, the above four methods overlaps to some degree, ratio analysis has been widely used in practice.
C. How data mining is applied into financialfield Traditional financial analysis approaches just simply use statistics to analyze a little financial data, for which it is difficult to analyze thousands of companies in some industry, which are not helpful for knowing potential and deep information hidden in large quantity of financial data, so we should make full use of the advantages of data mining
847
approach here. At present, seldom researchers apply data mining technology into financial field in China, there lies less than 10 related articles to financial analysis on core journals of China between 2000 and 2005, mainly because most of financial analyst know little about data mining methods and technology, they use basic and traditional financial analysis methods more that they grasp well, at the same time the data mining researchers know little about financial knowledge. Now, application of data mining into financial field in China mainly focuses on financial analysis, financial management and decision making. For application of data mining into financial analysis lies into two aspects, one is to classify the companies in some industry, [2] makes use of fuzzy clustering technology in the data mining to analyze the large ten listed iron companies according to newly critical financial indexes;[3] uses data mining tool Clementine to analyze financial status of Chinese listed companies, moreover rank and cluster them by factors analysis; The other is study financial forecast, [4] has Chinese listed companies as research object, uses discrimination analysis, logistic regression, neural network and their hybrid model to predict financial position;[5] involves in technology foundation and system framework of dynamic financial forecasting system embedded data mining technology; [6] is related to ideas of application of decision making tree, neural network in financial prediction. For application of data mining into financial management and decision making, [7] researches on a financial real time controlling system based on data mining; [8] is about the needs and processes of applying data mining and OLAP into financial decision making.
based on the data from banlance sheets, to select data by the function of slice or dice of OLAP, to process data by ratio analysis in traditional financial analysis methods, and build practical, significant annual assets structure statement, then on which, use clustering, association rules and decision making tree to go deep into analysis step by step on financial position of the companies in some industry, at lasts get some meaningful conclusions.
II. MODEL FRAMEWORK In order to go deep and completely into analyzing the financial data of companies, we use OLAP tool of data mining, build super-cubes according to our own needs, examine or analyze data from multiple dimensions. The simplest is to establish a three-dimension cube, X axis denotes some basic financial indexes, Y axis shows companies, Z axis expresses time. Then slice and dice on the three-dimension cube. For slice, for example, choose some fixed time on time dimension, analyze the differences of M indexes among N companies, compare their operational performance or clustering companies; We could also slice on index dimension, that is to choose some index, see how the index of N companies changes with time, and compare operational status in some aspect of the N companies. For dice, we can choose some company and some index, predict the development trend of the index with time in the company, and help with investment decision making [2]. Listed company has quarterly or annual financial statements, which are correct, authoritative and easy to get, for passing rigid audit procedures and being revealed to the public. So our research is based on the open financial statements of listed companies. The basic idea of the financial statement analysis model based on data mining we will present here is,
III. EXAMPLE A. Data preparation and selection The data in the paper is from the board of Netease Finances-Investment Service (http:Hquote.stock.163.com/), which provides detailed, various financial statements, including balance sheet, statement of cash flows, statement of profit etc, where we firstly choose the industry that we are interested in, the paper chose the data in balance sheets of more than 100 companies in the industry of electronic product on April 15,2005 as the research foundation, and formulated Summary Balance Sheet of the Industry of Electronic Product, then imported the data to SQL server database, by using OLAP tool-Analysis Service of which, built three-dimension cube including time, companies and indexes. Dec 31, 2005 was selected on the time dimension, we will use this slice for our next analysis. In order to demonstrate our model easily, we chose 15 companies randomly among so many companies of electronic product Al as follows: industry 600057(AMOISONIC ELECTRONICS CO.,LTD),A2 600060(HISENSE ELECTRIC CO.,LTD),A3 600076(WEIFANG BEIDA JADE BIRD HUAGUANG TECHNOLOGY CO.,LTD),A4 600089(TEBIAN ELECTRIC APPARATUS STOCK CHANGZHENG CO.,LTD),A5 600112(GUIZHOU ELECTRICAL APPARATUS CO.,LTD),A6 600139(MIANYANG GUANGYAO NEW MATERIAL CO.,LTD),A7 600169(TAIYUAN HEAVY INDUSTRY A8 BELLING 600171(SHANGHAI CO.,LTD), CORP.,LTD), A9 600183(GUANGDONG SHENGYI SCI.TECH CO.,LTD), AlO 600192(LANZHOU GREAT WALL ELECTRICAL CO.,LTD), All 600206(GRINM SEMICONDUCTOR MATERIALS CO.,LTD), A12 600207(HENAN ANCAI HI-TECH CO.,LTD.), A13 600237(ANHUI TONGFENG ELECTRONICS CO.,LTD), A14 600330(ZHEJIANG TIANTONG ELECTRONIC CO.,LTD), A15 600550 (BAODING TIANWEI BAOBIAN ELECTRIC CO.,LTD), For each company above, Al to A15 denotes serial number, then stock code and company name are followed. The selection procedure is independent, random and without subjective factors, which is about the information of 15 electronic product companies on Dec 31,2005, we got simple relationships and differences among indexes after simple statistical calculation.
B. Data processing We can see the values of indexes are wide, moreover, are fixed on some fixed time, just reflecting the basic financial
848
value of some company during some fixed period, no more meaning. Based on which to analyze, we perhaps can not achieve satisfied conclusions. The solution is to standardizing these data, which is a way of transforming original data into a specified interval by some method, for example the data is between -1 and 1, that is [-1,1]. The common standardization methods are zoom decimal, maximum or minimum standardization, and standard deviation standardization, but which will all make initial wide data centralize a very small interval with over subjectiveness, or result in the unconscious convergence of standardization value. In order to avoid the problem, we use ratio analysis commonly used in financial analysis as our standardization method to process data, which can work out more meaningful data. So we get the annual assets structure statement. Using ratio analysis can get rid of the affection from company scale, and make the profit and risk of different company is comparative. The figure expression of the data in annual assets structure statement, in which the dimension denoting by alphabet is index dimension, the dimension showing by number is company dimension, the vertical dimension denotes the value of concrete indexes of each company. We know by that the index discrepancy is narrow except that a little bit outstanding part, thus, no or very simple standardization is needed during the period of data pre-processing. So transforming basic financial data into ratio can omit or reduce the work of data processing at the beginning of data mining. Annual assets structure statement includes ratio analysis indicators showing credit capacity and running capacity, among which, current ratio and quick ratio reflect short-term credit capacity, assets liability rate show long-term credit capacity, receivables turnover, inventory turnover and total assets turnover express running capacity. The above indexes can reveal the status of companies and helpful for improving the quality and efficiency of data mining.
C. Financial analysis based on data mining methods There lie many data mining methods, each of which has its own focuses, advantages and shortcomings. We can not find the best a data mining method for a specified data set, so we select several data mining methods to analyze step by step and more and more deepen, and get the information and rules hidden in the data, moreover, the confidence degree and validity are compared and verified among these data mining methods. The united operation of clustering, association rules and decision making tree are demonstrated as follows. 1) Clustering analysis Clustering analysis is to divide the sample objects into several groups according to some measure standard, and make the samples similar in the same group and dissimilar in different groups, by which we can classify the existing electronic product companies into better financial position(level A) ,normal financial position(level B) and bad financial position(level C), the idea and steps of the clustering analysis is the following.
849
Calculate the average value of each index based on all the companies in the data set, the indexes are all positive correlate with company financial status. > For each index and each company, If the value of the company is greater than average value, showing the company own a good status of this index, we call it good index of the company, mark it. > Count the number of good indexes that each company own among these 18 indexes, if the number of the good indexes is greater than or equal to 10, which ascribes to Level A; if greater than or equal to 5 and smaller than 10, which is Level B; If smaller than 5, which belongs to Level C. The final clustering result is the following: Ascribing the five companies A1,A2,A4,A8,A12 to Level A; The companies at Level B are A3,A9,AlO,All,A13,A14; The four companies A5,A6,A7,A15 belong to Level C. The above method is simple but easy to understand. We have classified the companies by using clustering analysis according to their different financial statues. Next, let's go deep in and to find out the affection factors on financial position by using association rules analysis. 2) Association rules analysis In training data set, we take better company financial position as objective. In frequency item set, considering the support degree greater than 50% and confidence degree greater than 40%. To use association rules algorithm more conveniently, firstly round off the data in annual assets structure statement, the annual assets structure statement after pruning is omitted here. The steps of association analysis is as follows, > Produce frequency item set. Choose the companies at Level A in training data set and its corresponding values of indexes to form matrix, indexes are placed on columns, the values of indexes are on rows, thus the candidate set is constructed. Next calculate the support degree of the candidate set, get rid of the indexes whose support degree is smaller than 50%, then combine the retained indexes with objective property respectively, form the new 2-dimension candidate set, calculate its support degree, get rid of the indexes whose support degree is smaller than 50%,repeat the above procedure ...... Finally a frequency item set is formulated. > Get the rules of higher confidence degree. Filter the frequency item set, get smaller frequency item set. > Produce association rules. Produce the association rules whose confidence degree is greater than 40% in frequency item set > Finally get the strong association rules. It is clear by that the critical affection factors on company financial status are Inappropriate profit, Current ratio, Quick ratio and Total assets turnover, which are all in some interval, so investors, decision makers or financial managers can understand financial position better and deeper based on quantitative foundation, and work more efficiently. In the way, the strong association
rules table of companies at Level B or C can be built. Based on the above result, we identify the index characteristics of better financial position companies by using decision-making tree.
3) Decision making tree analysis First, we take the indexes as the main classification standard of decision making tree, build a simplified Annual assets structure statement(Omitted) just including Inappropriate profit, Current ratio, Quick ratio and Total assets turnover. Why we do so is because the indexes in strong association rules table affect greatly on company financial position, base on these indexes, the branches of decision making tree are more clear and simple, which is in fact pruning decision making tree in advance. Next we build decision making tree model. Choose company financial position as prediction index, take C4.5 as basic algorithm, use the improved algorithm that [10] presented to construct decision-making tree classification model. The algorithm selects properties to be checked by plus standard, which is based on the concept of entropy in informatics. Then, define entropy to describe information plus ratio, calculate the information plus of each index, the index that has the highest information plus has the strongest discrimination capability [11] We calculate the information plus value of the four properties in the simplified Annual asset structure statement by information plus formula and compare them, take the property with the most information plus as the first classification node, and that taking the second place as the next classification node, in this way, a decision making tree is formulated. The steps of calculation is as follows. > Calculate the expectation information that sample classification needs, that is entropy. 6 4 4 =1-57 =( 5 e5 EntropyJ
> Calculate the information entropy and information plus of each property. Entropy (I, Inappropriate profit per share) =1.04 Gain (X 1)= 1.57 - 1.04 = 0.53 Entropy (I, Quick ratio) =1.2 Gain ( X 2) = 1.57 - 1.2 = 0.37 Entropy (I, Total assets turnover) =1.12 Gain ( X 3) = 1.57 - 1.12.= 0.45 Entropy (I, Current assets ratio) =1.08 Gain (X 4) = 1.57 - 1.08 = 0.49 Rank according to information plus:X1>X4>X3>X2,so the algorithm of decision making tree choose in turn Inappropriate profit per share, Current assets ratio, Total assets turnover, Quick ratio as classification nodes to classify data set by information plus guide line, finally build decision making tree model, get the characteristics of company financial status. By the calculation of characteristic probability, we see that the probability of the companies that has better financial position in strong association rule is
high; the results here can support the association rules very well. At the same time, based on the results we could also predict the probability of the companies with better financial status to the whole industry, which is helpful for investors to forecast and analyze the future.
IV. CONCLUSIONS The paper takes the financial data of electronic product companies published on internet as research foundation, uses ratio analysis of the traditional financial analysis methods, builds practical Annual assets structure statement on the basis of original balance sheets, then on which, analyzes financial positions of companies in electronic product industry and finally gets some conclusions that make sense and benefit investors and decision makers to make right decision. Because of limited time and energy, here is just some tentative job, next we will verify the model presented in the paper in larger data sets, at the same time, to check the conclusions drawn here, and improve the model further. The model that the paper gives is all-purpose; we can try to apply it into financial analysis of other industry or subject, hoping to reinforce existing financial analysis methods. In addition, the paper has attempted the united schema of data mining algorithms, which benefits to go deep into analyzing problems.
REFERENCES
[I]Huang Jiejun,Pan Heping,Wan Youyong Rearch on applications of data and mining technology Computer Engineering Application,2003,No.39,pp.45-48. [2] Li Jianfeng,Li Vijun,Qi Wei,Jin Shiwei The application of data mining in analyzing accountant in the company Computer Engineering and Application,2005,No.2,pp.217-219. [3] Lin Weilin,Lin You Applying data mining into analysis of financial position for listed companies, Market Weekly,2004,No.10,pp.98-99. [4] Liu Min,Luo Hui A Prediction analysis of financial distress for listed companies-based on data mining approach Application of Statistics and Management 2004,Vol.23 No.3,pp.51-56. [5] Yao Kaohua,Jiang Yanhui Research on company dynamic financial forecast Journal of Xiangtan Normal University(Nature Science Edition),2005,Vol.27 No.3,pp.29-32. [6] Li Ailing,Shen Xianzhang,Li Yuzhou Research of data mining in financial forecast Journal of Anyang Normal Institute, 2005,No.2,pp.129- 131. [7] Liu Shengping,Zhang Qiluan Finance real-time controlling system based on data mining technology Finance and accounting monthly,2004,No.3,pp.31 -32. [8] Yang Chunhua Applying data mining,OLAP into financial decision Communication in Finance and Accounting,2002,No.10,pp.39-40. [9] Shi Lei Research on explored data mining model Henan Science,2000,Vol.18 No. l,pp45-48. [10] Ji Zhenming,Tao Shiqun A data mining model for the important consumers losing in telecom industries Computer Engineering and Application, 2004,No.23,pp.169-171. [11] Zhou Shixiong,Han Yongsheng Research on application of data mining technology in product similarity Computer Engineering and
_-'Io& ~:- 'toQs -laog-
Application,2005,No. l,pp.207-20.
850

Data Mining For Financial Statement Analysis

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Data Mining For Financial Statement Analysis

Diunggah oleh

Hak Cipta:

Format Tersedia

An Analysis Model of Financial Statements

Based on Data Mining

making tree, Data mining, Financial analysis

Index Terms-Association rules, Clustering, Decision

_-'Io& ~:- 'toQs -laog-

Anda mungkin juga menyukai