Anda di halaman 1dari 173

 WEKA: A Machine

Machine Learning with Learning Toolkit


WEKA  The Explorer
• Classification and
Regression
• Clustering
Eibe Frank • Association Rules
• Attribute Selection
Department of Computer Science,
University of Waikato, New Zealand
• Data Visualization
 The Experimenter
 The Knowledge
Flow GUI
 Conclusions
WEKA: the bird

Copyright: Martin Kramer (mkramer@wxs.nl)


12/08/21 University of Waikato 2
WEKA: the software
 Machine learning/data mining software written in
Java (distributed under the GNU Public License)
 Used for research, education, and applications
 Complements “Data Mining” by Witten & Frank
 Main features:
 Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
 Graphical user interfaces (incl. data visualization)

 Environment for comparing learning algorithms

12/08/21 University of Waikato 3


WEKA: versions
 There are several versions of WEKA:
 WEKA 3.0: “book version” compatible with
description in data mining book
 WEKA 3.2: “GUI version” adds graphical user

interfaces (book version is command-line only)


 WEKA 3.3: “development version” with lots of

improvements
 This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)

12/08/21 University of Waikato 4


WEKA only deals with “flat” files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
12/08/21 University of Waikato 5
WEKA only deals with “flat” files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
12/08/21 University of Waikato 6
12/08/21 University of Waikato 7
12/08/21 University of Waikato 8
12/08/21 University of Waikato 9
Explorer: pre-processing the data
 Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from an SQL
database (using JDBC)
 Pre-processing tools in WEKA are called “filters”
 WEKA contains filters for:
 Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …

12/08/21 University of Waikato 10


12/08/21 University of Waikato 11
12/08/21 University of Waikato 12
12/08/21 University of Waikato 13
12/08/21 University of Waikato 14
12/08/21 University of Waikato 15
12/08/21 University of Waikato 16
12/08/21 University of Waikato 17
12/08/21 University of Waikato 18
12/08/21 University of Waikato 19
12/08/21 University of Waikato 20
12/08/21 University of Waikato 21
12/08/21 University of Waikato 22
12/08/21 University of Waikato 23
12/08/21 University of Waikato 24
12/08/21 University of Waikato 25
12/08/21 University of Waikato 26
12/08/21 University of Waikato 27
12/08/21 University of Waikato 28
12/08/21 University of Waikato 29
12/08/21 University of Waikato 30
12/08/21 University of Waikato 31
Explorer: building “classifiers”
 Classifiers in WEKA are models for predicting
nominal or numeric quantities
 Implemented learning schemes include:
 Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
 “Meta”-classifiers include:
 Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …

12/08/21 University of Waikato 32


12/08/21 University of Waikato 33
12/08/21 University of Waikato 34
12/08/21 University of Waikato 35
12/08/21 University of Waikato 36
12/08/21 University of Waikato 37
12/08/21 University of Waikato 38
12/08/21 University of Waikato 39
12/08/21 University of Waikato 40
12/08/21 University of Waikato 41
12/08/21 University of Waikato 42
12/08/21 University of Waikato 43
12/08/21 University of Waikato 44
12/08/21 University of Waikato 45
12/08/21 University of Waikato 46
12/08/21 University of Waikato 47
12/08/21 University of Waikato 48
12/08/21 University of Waikato 49
12/08/21 University of Waikato 50
12/08/21 University of Waikato 51
12/08/21 University of Waikato 52
12/08/21 University of Waikato 53
12/08/21 University of Waikato 54
12/08/21 University of Waikato 55
12/08/21 University of Waikato 56
12/08/21 University of Waikato 57
12/08/21 University of Waikato 58
12/08/21 University of Waikato 59
12/08/21 University of Waikato 60
12/08/21 University of Waikato 61
12/08/21 University of Waikato 62
12/08/21 University of Waikato 63
12/08/21 University of Waikato 64
QuickTime™ and aTIFF (LZW) decompressorare needed to see this University
12/08/21 picture. of Waikato 65
QuickTime™ and aTIFF (LZW) decompressorare needed to see this University
12/08/21 picture. of Waikato 66
QuickTime™ and aTIFF (LZW) decompressorare needed to see this University
12/08/21 picture. of Waikato 67
12/08/21 University of Waikato 68
12/08/21 University of Waikato 69
12/08/21 University of Waikato 70
12/08/21 University of Waikato 71
12/08/21 University of Waikato 72
12/08/21 University of Waikato 73
12/08/21 University of Waikato 74
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

12/08/21 University of Waikato 75


12/08/21 University of Waikato 76
12/08/21 University of Waikato 77
12/08/21 University of Waikato 78
12/08/21 University of Waikato 79
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

12/08/21 University of Waikato 80


QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

12/08/21 University of Waikato 81


12/08/21 University of Waikato 82
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

12/08/21 University of Waikato 83


12/08/21 University of Waikato 84
12/08/21 University of Waikato 85
12/08/21 University of Waikato 86
12/08/21 University of Waikato 87
12/08/21 University of Waikato 88
12/08/21 University of Waikato 89
12/08/21 University of Waikato 90
12/08/21 University of Waikato 91
Explorer: clustering data
 WEKA contains “clusterers” for finding groups of
similar instances in a dataset
 Implemented schemes are:
 k-Means, EM, Cobweb, X-means, FarthestFirst
 Clusters can be visualized and compared to “true”
clusters (if given)
 Evaluation based on loglikelihood if clustering
scheme produces a probability distribution

12/08/21 University of Waikato 92


12/08/21 University of Waikato 93
12/08/21 University of Waikato 94
12/08/21 University of Waikato 95
12/08/21 University of Waikato 96
12/08/21 University of Waikato 97
12/08/21 University of Waikato 98
12/08/21 University of Waikato 99
12/08/21 University of Waikato 100
12/08/21 University of Waikato 101
12/08/21 University of Waikato 102
12/08/21 University of Waikato 103
12/08/21 University of Waikato 104
12/08/21 University of Waikato 105
12/08/21 University of Waikato 106
12/08/21 University of Waikato 107
Explorer: finding associations
 WEKA contains an implementation of the Apriori
algorithm for learning association rules
 Works only with discrete data
 Can identify statistical dependencies between
groups of attributes:
 milk, butter  bread, eggs (with confidence 0.9 and
support 2000)
 Apriori can compute all rules that have a given
minimum support and exceed a given confidence

12/08/21 University of Waikato 108


12/08/21 University of Waikato 109
12/08/21 University of Waikato 110
12/08/21 University of Waikato 111
12/08/21 University of Waikato 112
12/08/21 University of Waikato 113
12/08/21 University of Waikato 114
12/08/21 University of Waikato 115
Explorer: attribute selection
 Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
 Attribute selection methods contain two parts:
 A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
 An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …


 Very flexible: WEKA allows (almost) arbitrary
combinations of these two

12/08/21 University of Waikato 116


12/08/21 University of Waikato 117
12/08/21 University of Waikato 118
12/08/21 University of Waikato 119
12/08/21 University of Waikato 120
12/08/21 University of Waikato 121
12/08/21 University of Waikato 122
12/08/21 University of Waikato 123
12/08/21 University of Waikato 124
Explorer: data visualization
 Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
 WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
 To do: rotating 3-d visualizations (Xgobi-style)
 Color-coded class values
 “Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
 “Zoom-in” function
12/08/21 University of Waikato 125
12/08/21 University of Waikato 126
12/08/21 University of Waikato 127
12/08/21 University of Waikato 128
12/08/21 University of Waikato 129
12/08/21 University of Waikato 130
12/08/21 University of Waikato 131
12/08/21 University of Waikato 132
12/08/21 University of Waikato 133
12/08/21 University of Waikato 134
12/08/21 University of Waikato 135
12/08/21 University of Waikato 136
12/08/21 University of Waikato 137
Performing experiments
 Experimenter makes it easy to compare the
performance of different learning schemes
 For classification and regression problems
 Results can be written into file or database
 Evaluation options: cross-validation, learning
curve, hold-out
 Can also iterate over different parameter settings
 Significance-testing built in!

12/08/21 University of Waikato 138


12/08/21 University of Waikato 139
12/08/21 University of Waikato 140
12/08/21 University of Waikato 141
12/08/21 University of Waikato 142
12/08/21 University of Waikato 143
12/08/21 University of Waikato 144
12/08/21 University of Waikato 145
12/08/21 University of Waikato 146
12/08/21 University of Waikato 147
12/08/21 University of Waikato 148
12/08/21 University of Waikato 149
12/08/21 University of Waikato 150
12/08/21 University of Waikato 151
The Knowledge Flow GUI
 New graphical user interface for WEKA
 Java-Beans-based interface for setting up and
running machine learning experiments
 Data sources, classifiers, etc. are beans and can
be connected graphically
 Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
 Layouts can be saved and loaded again later

12/08/21 University of Waikato 152


12/08/21 University of Waikato 153
12/08/21 University of Waikato 154
12/08/21 University of Waikato 155
12/08/21 University of Waikato 156
12/08/21 University of Waikato 157
12/08/21 University of Waikato 158
12/08/21 University of Waikato 159
12/08/21 University of Waikato 160
12/08/21 University of Waikato 161
12/08/21 University of Waikato 162
12/08/21 University of Waikato 163
12/08/21 University of Waikato 164
12/08/21 University of Waikato 165
12/08/21 University of Waikato 166
12/08/21 University of Waikato 167
12/08/21 University of Waikato 168
12/08/21 University of Waikato 169
12/08/21 University of Waikato 170
12/08/21 University of Waikato 171
12/08/21 University of Waikato 172
Conclusion: try it yourself!
 WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
 Also has a list of projects based on WEKA
 WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian
H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de
Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard
Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle,
Xin Xu, Yong Wang, Zhihai Wang

12/08/21 University of Waikato 173

Anda mungkin juga menyukai