Machine Learning with WEKA
WEKA: A Machine Learning Toolkit The Explorer •
•
Eibe Frank
• •
Department of Computer Science, University of Waikato, New Zealand
•
Classification and Regression Clustering Association Rules Attribute Selection Data Visualization
The Experimenter The Knowledge Flow GUI Conclusions
WEKA: the bird
Copyright: Martin Kramer (
[email protected]) 2/22/2011
University of Waikato
2
WEKA: the software
Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
2/22/2011
University of Waikato
3
WEKA: versions
There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements
This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
2/22/2011
University of Waikato
4
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/22/2011
University of Waikato
5
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/22/2011
University of Waikato
6
2/22/2011
University of Waikato
7
2/22/2011
University of Waikato
8
2/22/2011
University of Waikato
9
Explorer: pre-processing the data
Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for:
2/22/2011
Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
University of Waikato
10
2/22/2011
University of Waikato
11
2/22/2011
University of Waikato
12
2/22/2011
University of Waikato
13
2/22/2011
University of Waikato
14
2/22/2011
University of Waikato
15
2/22/2011
University of Waikato
16
2/22/2011
University of Waikato
17
2/22/2011
University of Waikato
18
2/22/2011
University of Waikato
19
2/22/2011
University of Waikato
20
2/22/2011
University of Waikato
21
2/22/2011
University of Waikato
22
2/22/2011
University of Waikato
23
2/22/2011
University of Waikato
24
2/22/2011
University of Waikato
25
2/22/2011
University of Waikato
26
2/22/2011
University of Waikato
27
2/22/2011
University of Waikato
28
2/22/2011
University of Waikato
29
2/22/2011
University of Waikato
30
2/22/2011
University of Waikato
31
Explorer: building “classifiers”
Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include:
Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
2/22/2011
Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … University of Waikato
32
2/22/2011
University of Waikato
33
2/22/2011
University of Waikato
34
2/22/2011
University of Waikato
35
2/22/2011
University of Waikato
36
2/22/2011
University of Waikato
37
2/22/2011
University of Waikato
38
2/22/2011
University of Waikato
39
2/22/2011
University of Waikato
40
2/22/2011
University of Waikato
41
2/22/2011
University of Waikato
42
2/22/2011
University of Waikato
43
2/22/2011
University of Waikato
44
2/22/2011
University of Waikato
45
2/22/2011
University of Waikato
46
2/22/2011
University of Waikato
47
2/22/2011
University of Waikato
48
2/22/2011
University of Waikato
49
2/22/2011
University of Waikato
50
2/22/2011
University of Waikato
51
2/22/2011
University of Waikato
52
2/22/2011
University of Waikato
53
2/22/2011
University of Waikato
54
2/22/2011
University of Waikato
55
2/22/2011
University of Waikato
56
2/22/2011
University of Waikato
57
2/22/2011
University of Waikato
58
2/22/2011
University of Waikato
59
2/22/2011
University of Waikato
60
2/22/2011
University of Waikato
61
2/22/2011
University of Waikato
62
2/22/2011
University of Waikato
63
2/22/2011
University of Waikato
64
2/22/2011
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
University of Waikato
65
2/22/2011
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
University of Waikato
66
2/22/2011
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
University of Waikato
67
2/22/2011
University of Waikato
68
2/22/2011
University of Waikato
69
2/22/2011
University of Waikato
70
2/22/2011
University of Waikato
71
2/22/2011
University of Waikato
72
2/22/2011
University of Waikato
73
2/22/2011
University of Waikato
74
Quic k Tim e™ and a TIFF (LZW) dec om pres s or are needed to s ee this pic ture.
2/22/2011
University of Waikato
75
2/22/2011
University of Waikato
76
2/22/2011
University of Waikato
77
2/22/2011
University of Waikato
78
2/22/2011
University of Waikato
79
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011
University of Waikato
80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011
University of Waikato
81
2/22/2011
University of Waikato
82
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
2/22/2011
University of Waikato
83
2/22/2011
University of Waikato
84
2/22/2011
University of Waikato
85
2/22/2011
University of Waikato
86
2/22/2011
University of Waikato
87
2/22/2011
University of Waikato
88
2/22/2011
University of Waikato
89
2/22/2011
University of Waikato
90
2/22/2011
University of Waikato
91
Explorer: clustering data
WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are:
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution
2/22/2011
University of Waikato
92
2/22/2011
University of Waikato
93
2/22/2011
University of Waikato
94
2/22/2011
University of Waikato
95
2/22/2011
University of Waikato
96
2/22/2011
University of Waikato
97
2/22/2011
University of Waikato
98
2/22/2011
University of Waikato
99
2/22/2011
University of Waikato
100
2/22/2011
University of Waikato
101
2/22/2011
University of Waikato
102
2/22/2011
University of Waikato
103
2/22/2011
University of Waikato
104
2/22/2011
University of Waikato
105
2/22/2011
University of Waikato
106
2/22/2011
University of Waikato
107
Explorer: finding associations
WEKA contains an implementation of the Apriori algorithm for learning association rules
Can identify statistical dependencies between groups of attributes:
Works only with discrete data
milk, butter bread, eggs (with confidence 0.9 and support 2000)
Apriori can compute all rules that have a given minimum support and exceed a given confidence
2/22/2011
University of Waikato
108
2/22/2011
University of Waikato
109
2/22/2011
University of Waikato
110
2/22/2011
University of Waikato
111
2/22/2011
University of Waikato
112
2/22/2011
University of Waikato
113
2/22/2011
University of Waikato
114
2/22/2011
University of Waikato
115
Explorer: attribute selection
Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary combinations of these two
2/22/2011
University of Waikato
116
2/22/2011
University of Waikato
117
2/22/2011
University of Waikato
118
2/22/2011
University of Waikato
119
2/22/2011
University of Waikato
120
2/22/2011
University of Waikato
121
2/22/2011
University of Waikato
122
2/22/2011
University of Waikato
123
2/22/2011
University of Waikato
124
Explorer: data visualization
Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function
2/22/2011
University of Waikato
125
2/22/2011
University of Waikato
126
2/22/2011
University of Waikato
127
2/22/2011
University of Waikato
128
2/22/2011
University of Waikato
129
2/22/2011
University of Waikato
130
2/22/2011
University of Waikato
131
2/22/2011
University of Waikato
132
2/22/2011
University of Waikato
133
2/22/2011
University of Waikato
134
2/22/2011
University of Waikato
135
2/22/2011
University of Waikato
136
2/22/2011
University of Waikato
137
Conclusion: try it yourself!
WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
2/22/2011
University of Waikato
138