Machine Learning with WEKA



WEKA: A Machine Learning Toolkit The Explorer •



Eibe Frank

• •

Department of Computer Science, University of Waikato, New Zealand









Classification and Regression Clustering Association Rules Attribute Selection Data Visualization

The Experimenter The Knowledge Flow GUI Conclusions

WEKA: the bird

Copyright: Martin Kramer ([email protected]) 2/22/2011

University of Waikato

2

WEKA: the software 



 

Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods  Graphical user interfaces (incl. data visualization)  Environment for comparing learning algorithms 

2/22/2011

University of Waikato

3

WEKA: versions 

There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book  WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)  WEKA 3.3: “development version” with lots of improvements 



This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)

2/22/2011

University of Waikato

4

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/22/2011

University of Waikato

5

WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/22/2011

University of Waikato

6

2/22/2011

University of Waikato

7

2/22/2011

University of Waikato

8

2/22/2011

University of Waikato

9

Explorer: pre-processing the data 



 

Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: 

2/22/2011

Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

University of Waikato

10

2/22/2011

University of Waikato

11

2/22/2011

University of Waikato

12

2/22/2011

University of Waikato

13

2/22/2011

University of Waikato

14

2/22/2011

University of Waikato

15

2/22/2011

University of Waikato

16

2/22/2011

University of Waikato

17

2/22/2011

University of Waikato

18

2/22/2011

University of Waikato

19

2/22/2011

University of Waikato

20

2/22/2011

University of Waikato

21

2/22/2011

University of Waikato

22

2/22/2011

University of Waikato

23

2/22/2011

University of Waikato

24

2/22/2011

University of Waikato

25

2/22/2011

University of Waikato

26

2/22/2011

University of Waikato

27

2/22/2011

University of Waikato

28

2/22/2011

University of Waikato

29

2/22/2011

University of Waikato

30

2/22/2011

University of Waikato

31

Explorer: building “classifiers” 



Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: 



Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

“Meta”-classifiers include: 

2/22/2011

Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … University of Waikato

32

2/22/2011

University of Waikato

33

2/22/2011

University of Waikato

34

2/22/2011

University of Waikato

35

2/22/2011

University of Waikato

36

2/22/2011

University of Waikato

37

2/22/2011

University of Waikato

38

2/22/2011

University of Waikato

39

2/22/2011

University of Waikato

40

2/22/2011

University of Waikato

41

2/22/2011

University of Waikato

42

2/22/2011

University of Waikato

43

2/22/2011

University of Waikato

44

2/22/2011

University of Waikato

45

2/22/2011

University of Waikato

46

2/22/2011

University of Waikato

47

2/22/2011

University of Waikato

48

2/22/2011

University of Waikato

49

2/22/2011

University of Waikato

50

2/22/2011

University of Waikato

51

2/22/2011

University of Waikato

52

2/22/2011

University of Waikato

53

2/22/2011

University of Waikato

54

2/22/2011

University of Waikato

55

2/22/2011

University of Waikato

56

2/22/2011

University of Waikato

57

2/22/2011

University of Waikato

58

2/22/2011

University of Waikato

59

2/22/2011

University of Waikato

60

2/22/2011

University of Waikato

61

2/22/2011

University of Waikato

62

2/22/2011

University of Waikato

63

2/22/2011

University of Waikato

64

2/22/2011

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

University of Waikato

65

2/22/2011

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

University of Waikato

66

2/22/2011

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

University of Waikato

67

2/22/2011

University of Waikato

68

2/22/2011

University of Waikato

69

2/22/2011

University of Waikato

70

2/22/2011

University of Waikato

71

2/22/2011

University of Waikato

72

2/22/2011

University of Waikato

73

2/22/2011

University of Waikato

74

Quic k Tim e™ and a TIFF (LZW) dec om pres s or are needed to s ee this pic ture.

2/22/2011

University of Waikato

75

2/22/2011

University of Waikato

76

2/22/2011

University of Waikato

77

2/22/2011

University of Waikato

78

2/22/2011

University of Waikato

79

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011

University of Waikato

80

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011

University of Waikato

81

2/22/2011

University of Waikato

82

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

2/22/2011

University of Waikato

83

2/22/2011

University of Waikato

84

2/22/2011

University of Waikato

85

2/22/2011

University of Waikato

86

2/22/2011

University of Waikato

87

2/22/2011

University of Waikato

88

2/22/2011

University of Waikato

89

2/22/2011

University of Waikato

90

2/22/2011

University of Waikato

91

Explorer: clustering data 



WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: 





k-Means, EM, Cobweb, X-means, FarthestFirst

Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution

2/22/2011

University of Waikato

92

2/22/2011

University of Waikato

93

2/22/2011

University of Waikato

94

2/22/2011

University of Waikato

95

2/22/2011

University of Waikato

96

2/22/2011

University of Waikato

97

2/22/2011

University of Waikato

98

2/22/2011

University of Waikato

99

2/22/2011

University of Waikato

100

2/22/2011

University of Waikato

101

2/22/2011

University of Waikato

102

2/22/2011

University of Waikato

103

2/22/2011

University of Waikato

104

2/22/2011

University of Waikato

105

2/22/2011

University of Waikato

106

2/22/2011

University of Waikato

107

Explorer: finding associations 

WEKA contains an implementation of the Apriori algorithm for learning association rules 



Can identify statistical dependencies between groups of attributes: 



Works only with discrete data

milk, butter  bread, eggs (with confidence 0.9 and support 2000)

Apriori can compute all rules that have a given minimum support and exceed a given confidence

2/22/2011

University of Waikato

108

2/22/2011

University of Waikato

109

2/22/2011

University of Waikato

110

2/22/2011

University of Waikato

111

2/22/2011

University of Waikato

112

2/22/2011

University of Waikato

113

2/22/2011

University of Waikato

114

2/22/2011

University of Waikato

115

Explorer: attribute selection 



Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking  An evaluation method: correlation-based, wrapper, information gain, chi-squared, … 



Very flexible: WEKA allows (almost) arbitrary combinations of these two

2/22/2011

University of Waikato

116

2/22/2011

University of Waikato

117

2/22/2011

University of Waikato

118

2/22/2011

University of Waikato

119

2/22/2011

University of Waikato

120

2/22/2011

University of Waikato

121

2/22/2011

University of Waikato

122

2/22/2011

University of Waikato

123

2/22/2011

University of Waikato

124

Explorer: data visualization 



Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) 







To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function

2/22/2011

University of Waikato

125

2/22/2011

University of Waikato

126

2/22/2011

University of Waikato

127

2/22/2011

University of Waikato

128

2/22/2011

University of Waikato

129

2/22/2011

University of Waikato

130

2/22/2011

University of Waikato

131

2/22/2011

University of Waikato

132

2/22/2011

University of Waikato

133

2/22/2011

University of Waikato

134

2/22/2011

University of Waikato

135

2/22/2011

University of Waikato

136

2/22/2011

University of Waikato

137

Conclusion: try it yourself! 

 

WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang

2/22/2011

University of Waikato

138