)
Data Mining: Foundations and Intelligent Paradigms
Intelligent Systems Reference Library, Volume 25
Editors-in-Chief
Vol. 1. Christine L. Mumford and Lakhmi C. Jain (Eds.) Vol. 13. Witold Pedrycz and Shyi-Ming Chen (Eds.)
Computational Intelligence: Collaboration, Fusion Granular Computing and Intelligent Systems, 2011
and Emergence, 2009 ISBN 978-3-642-19819-9
ISBN 978-3-642-01798-8 Vol. 14. George A. Anastassiou and Oktay Duman
Vol. 2. Yuehui Chen and Ajith Abraham Towards Intelligent Modeling: Statistical Approximation
Tree-Structure Based Hybrid Theory, 2011
Computational Intelligence, 2009 ISBN 978-3-642-19825-0
ISBN 978-3-642-04738-1 Vol. 15. Antonino Freno and Edmondo Trentin
Hybrid Random Fields, 2011
Vol. 3. Anthony Finn and Steve Scheding ISBN 978-3-642-20307-7
Developments and Challenges for
Autonomous Unmanned Vehicles, 2010 Vol. 16. Alexiei Dingli
ISBN 978-3-642-10703-0 Knowledge Annotation: Making Implicit Knowledge
Explicit, 2011
Vol. 4. Lakhmi C. Jain and Chee Peng Lim (Eds.) ISBN 978-3-642-20322-0
Handbook on Decision Making: Techniques
and Applications, 2010 Vol. 17. Crina Grosan and Ajith Abraham
ISBN 978-3-642-13638-2 Intelligent Systems, 2011
ISBN 978-3-642-21003-7
Vol. 5. George A. Anastassiou
Vol. 18. Achim Zielesny
Intelligent Mathematics: Computational Analysis, 2010
From Curve Fitting to Machine Learning, 2011
ISBN 978-3-642-17097-3
ISBN 978-3-642-21279-6
Vol. 6. Ludmila Dymowa Vol. 19. George A. Anastassiou
Soft Computing in Economics and Finance, 2011 Intelligent Systems: Approximation by Artificial Neural
ISBN 978-3-642-17718-7 Networks, 2011
ISBN 978-3-642-21430-1
Vol. 7. Gerasimos G. Rigatos
Modelling and Control for Intelligent Industrial Systems, 2011 Vol. 20. Lech Polkowski
ISBN 978-3-642-17874-0 Approximate Reasoning by Parts, 2011
ISBN 978-3-642-22278-8
Vol. 8. Edward H.Y. Lim, James N.K. Liu, and
Raymond S.T. Lee Vol. 21. Igor Chikalov
Knowledge Seeker – Ontology Modelling for Information Average Time Complexity of Decision Trees, 2011
Search and Management, 2011 ISBN 978-3-642-22660-1
ISBN 978-3-642-17915-0 Vol. 22. Przemyslaw Różewski,
Vol. 9. Menahem Friedman and Abraham Kandel Emma Kusztina, Ryszard Tadeusiewicz,
Calculus Light, 2011 and Oleg Zaikin
ISBN 978-3-642-17847-4 Intelligent Open Learning Systems, 2011
ISBN 978-3-642-22666-3
Vol. 10. Andreas Tolk and Lakhmi C. Jain
Vol. 23. Dawn E. Holmes and Lakhmi C. Jain (Eds.)
Intelligence-Based Systems Engineering, 2011
Data Mining: Foundations and Intelligent Paradigms, 2012
ISBN 978-3-642-17930-3
ISBN 978-3-642-23165-0
Vol. 11. Samuli Niiranen and Andre Ribeiro (Eds.) Vol. 24. Dawn E. Holmes and Lakhmi C. Jain (Eds.)
Information Processing and Biological Systems, 2011 Data Mining: Foundations and Intelligent Paradigms, 2012
ISBN 978-3-642-19620-1 ISBN 978-3-642-23240-4
Vol. 12. Florin Gorunescu Vol. 25. Dawn E. Holmes and Lakhmi C. Jain (Eds.)
Data Mining, 2011 Data Mining: Foundations and Intelligent Paradigms, 2012
ISBN 978-3-642-19720-8 ISBN 978-3-642-23150-6
Dawn E. Holmes and Lakhmi C. Jain (Eds.)
123
Prof. Dawn E. Holmes Prof. Lakhmi C. Jain
Department of Statistics and Applied Probability Professor of Knowledge-Based Engineering
University of California, University of South Australia
Santa Barbara, Adelaide
CA 93106 Mawson Lakes, SA 5095
USA Australia
E-mail: holmes@pstat.ucsb.edu E-mail: Lakhmi.jain@unisa.edu.au
DOI 10.1007/978-3-642-23151-3
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilm or in any other way,
and storage in data banks. Duplication of this publication or parts thereof is permitted
only under the provisions of the German Copyright Law of September 9, 1965, in
its current version, and permission for use must always be obtained from Springer.
Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publi-
cation does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general
use.
Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India.
Printed on acid-free paper
987654321
springer.com
Preface
There are many invaluable books available on data mining theory and applications.
However, in compiling a volume titled “DATA MINING: Foundations and Intelligent
Paradigms: Volume 3: Medical, Health, Social, Biological and other Applications” we
wish to introduce some of the latest developments to a broad audience of both
specialists and non-specialists in this field.
The term ‘data mining’ was introduced in the 1990’s to describe an emerging field
based on classical statistics, artificial intelligence and machine learning. By combining
techniques from these areas, and developing new ones researchers are able to
innovatively analyze large datasets productively. Patterns found in these datasets are
subsequently analyzed with a view to acquiring new knowledge. These techniques
have been applied in a broad range of medical, health, social and biological areas.
In compiling this volume we have sought to present innovative research from
prestigious contributors in the field of data mining. Each chapter is self-contained and
is described briefly in Chapter 1.
This book will prove valuable to theoreticians as well as application
scientists/engineers in the area of Data Mining. Postgraduate students will also find
this a useful sourcebook since it shows the direction of current research.
We have been fortunate in attracting top class researchers as contributors and wish
to offer our thanks for their support in this project. We also acknowledge the expertise
and time of the reviewers. Finally, we also wish to thank Springer for their support.
Chapter 1
Advances in Intelligent Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Dawn E. Holmes, Jeffrey W. Tweedale, Lakhmi C. Jain
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Medical Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Health Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Social Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4.1 Information Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4.2 On-Line Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5 Biological Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1 Biological Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.2 Estimations in Gene Expression . . . . . . . . . . . . . . . . . . . . . . 4
6 Chapters Included in the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2
Temporal Pattern Mining for Medical Applications . . . . . . . . . . . . . 9
Giulia Bruno, Paolo Garza
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Types of Temporal Data in Medical Domain . . . . . . . . . . . . . . . . . 10
3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Temporal Pattern Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Temporal Pattern Mining from a Set of Sequences . . . . . . 12
4.2 Temporal Pattern Mining from a Single Sequence . . . . . . 14
5 Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3
BioKeySpotter: An Unsupervised Keyphrase Extraction
Technique in the Biomedical Full-Text Collection . . . . . . . . . . . . . . . 19
Min Song, Prat Tanapaisankit
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
VIII Contents
Chapter 4
Mining Health Claims Data for Assessing Patient Risk . . . . . . . . . . 29
Ian Duncan
1 What Is Health Risk? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Traditional Models for Assessing Health Risk . . . . . . . . . . . . . . . . 33
3 Risk Factor-Based Risk Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Enrollment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Claims and Coding Systems . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Interpretation of Claims Codes . . . . . . . . . . . . . . . . . . . . . . . 49
5 Clinical Identification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Sensitivity-Specificity Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.1 Constructing an Identification Algorithm . . . . . . . . . . . . . . 56
6.2 Sources of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7 Construction and Use of Grouper Models . . . . . . . . . . . . . . . . . . . . 58
7.1 Drug Grouper Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Drug-Based Risk Adjustment Models . . . . . . . . . . . . . . . . . 61
8 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Chapter 5
Mining Biological Networks for Similar Patterns . . . . . . . . . . . . . . . . 63
Ferhat Ay, Günhan Gülsoy, Tamer Kahveci
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2 Metabolic Network Alignment with One-to-One Mappings . . . . . 67
2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3 Pairwise Similarity of Entities . . . . . . . . . . . . . . . . . . . . . . . 70
2.4 Similarity of Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.5 Combining Homology and Topology . . . . . . . . . . . . . . . . . . 76
2.6 Extracting the Mapping of Entities . . . . . . . . . . . . . . . . . . 78
2.7 Similarity Score of Networks . . . . . . . . . . . . . . . . . . . . . . . . 79
2.8 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3 Metabolic Network Alignment with One-to-Many Mappings . . . 80
3.1 Homological Similarity of Subnetworks . . . . . . . . . . . . . . . . 82
3.2 Topological Similarity of Subnetworks . . . . . . . . . . . . . . . . . 83
Contents IX
Chapter 6
Estimation of Distribution Algorithms in Gene Expression
Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Elham Salehi, Robin Gras
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2 Estimation of Distribution of Algorithms . . . . . . . . . . . . . . . . . . . . 102
2.1 Model Building in EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.3 Models with Independent Variables . . . . . . . . . . . . . . . . . . . 104
2.4 Models with Pair Wise Dependencies . . . . . . . . . . . . . . . . . 105
2.5 Models with Multiple Dependencies . . . . . . . . . . . . . . . . . . . 106
3 Application of EDA in Gene Expression Data Analysis . . . . . . . . 108
3.1 State-of-Art of the Application of EDAs in Gene
Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Chapter 7
Gene Function Prediction and Functional Network: The Role
of Gene Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Erliang Zeng, Chris Ding, Kalai Mathee, Lisa Schneper, Giri Narasimhan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
1.1 Gene Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
1.2 Functional Gene Network Generation . . . . . . . . . . . . . . . . . 127
1.3 Related Work and Limitations . . . . . . . . . . . . . . . . . . . . . . . 128
2 GO-Based Gene Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . 129
3 Estimating Support for PPI Data with Applications to
Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.1 Mixture Model of PPI Data . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.3 Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.4 Evaluating the Function Prediction . . . . . . . . . . . . . . . . . . . 135
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
X Contents
Chapter 8
Mining Multiple Biological Data for Reconstructing Signal
Transduction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Thanh-Phuong Nguyen, Tu-Bao Ho
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
2.1 Signal Transduction Network . . . . . . . . . . . . . . . . . . . . . . . . 164
2.2 Protein-Protein Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3 Constructing Signal Transduction Networks Using Multiple
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3.3 Clustering and Protein-Protein Interaction Networks . . . . 169
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4 Some Results of Yeast STN Reconstruction . . . . . . . . . . . . . . . . . . 178
5 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Chapter 9
Mining Epistatic Interactions from High-Dimensional
Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Xia Jiang, Shyam Visweswaran, Richard E. Neapolitan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
2.1 Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
2.2 Detecting Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2.3 High-Dimensional Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . 190
2.4 Barriers to Learning Epistasis . . . . . . . . . . . . . . . . . . . . . . . . 191
2.5 MDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
2.6 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3 Discovering Epistasis Using Bayesian Networks . . . . . . . . . . . . . . . 196
3.1 A Bayesian Network Model for Epistatic Interactions . . . 196
3.2 The BNMBL Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Contents XI
Chapter 10
Knowledge Discovery in Adversarial Settings . . . . . . . . . . . . . . . . . . . 211
D.B. Skillicorn
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
2 Characteristics of Adversarial Modelling . . . . . . . . . . . . . . . . . . . . . 214
3 Technical Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Chapter 11
Analysis and Mining of Online Communities of Internet
Forum Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Mikolaj Morzy
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
1.1 What Is Web 2.0? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
1.2 New Forms of Participation — Push or Pull? . . . . . . . . . . 228
1.3 Internet Forums as New Forms of Conversation . . . . . . . . 229
2 Social-Driven Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
2.1 What Are Social-Driven Data? . . . . . . . . . . . . . . . . . . . . . . . 231
2.2 Data from Internet Forums . . . . . . . . . . . . . . . . . . . . . . . . . . 234
3 Internet Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
3.1 Crawling Internet Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
3.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
3.3 Index Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
3.4 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Chapter 12
Data Mining for Information Literacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Bettina Berendt
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
2.1 Information Literacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
2.2 Critical Literacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
2.3 Educational Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
3 Towards Critical Data Literacy: A Frame for Analysis and
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
XII Contents
Chapter 13
Rule Extraction from Neural Networks and Support Vector
Machines for Credit Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Rudy Setiono, Bart Baesens, David Martens
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
2 Re-RX: Recursive Rule Extraction from Neural Networks . . . . . . 300
2.1 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
2.2 Finding Optimal Network Structure by Pruning . . . . . . . . 303
2.3 Recursive Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
2.4 Applying Re-RX for Credit Scoring . . . . . . . . . . . . . . . . . . . 306
3 ALBA: Rule Extraction from Support Vector Machines . . . . . . . 311
3.1 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
3.2 ALBA: Active Learning Based Approach to SVM Rule
Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
3.3 Applying ALBA for Credit Scoring . . . . . . . . . . . . . . . . . . . 316
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Chapter 14
Using Self-Organizing Map for Data Mining: A Synthesis with
Accounting Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Andriy Andreev, Argyris Argyrou
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
2 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
2.1 Types of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
2.2 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Contents XIII
Chapter 15
Applying Data Mining Techniques to Assess Steel Plant
Operation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Khan Muhammad Badruddin, Isao Yagi, Takao Terano
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
2 Brief Description of EAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
2.1 Performance Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . 346
2.2 Innovations in Electric Arc Furnaces . . . . . . . . . . . . . . . . . . 346
2.3 Details of the Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
2.4 Understanding SCIPs and Stages of a Heat . . . . . . . . . . . . 349
3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
4 Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
4.3 Attribute Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
4.4 The Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
4.5 Data Mining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360