Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References Clustering using Unsupervised Regression Tree...
Author: Egbert Walsh
8 downloads 0 Views 601KB Size
Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Clustering using Unsupervised Regression Trees: CURT R. Fraiman, B. Ghattas, M. Svarc

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Outline



Motivation and context.



Existing work.



Our two stage method



Simulations, comparisons.



Application



Conclusion

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Statistical Learning ◮





Supervised and unsupervised learning (Hastie et al. [2003]) share a lot of features within the algorithms they use to estimate the underlying models. In supervised learning a target variable Y is available (Regression = Y ∈ R, Classification = Y ∈ 1, ..., J). Unsupervised learning concerns Clustering and Density estimation, the second often resolves the first.

Clustering aims to partition the data. Some supervised methods are based also on partitioning the space of explanatory variables (CART, SVM, LDA). Partitioning may be: ◮



Hierarchical: find successive groups splitting or joining previously established groups (”bottom-up”, ”top-down”) partitional-non hierarchical: k-means.

Our objective is to propose a clustering procedure sharing some nice properties of CART (Breiman et al. [1984]). R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

What is CART data : (X, Y ) ∈ R p × C X predictor, attributes, features Y ∈ C output to predict. C = R or C = {1, ..., J}. (X, Y ) following a distribution D. Objective : Using the observations (Xi , Yi ) from D, construct a classifier bf (X) having a low generalization error: R(bf ) = E[L(Y , bf (X)]

where L is a loss function. L is the quadratic error in regression, or the misclassification rate in classification. R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

The model Search for a partition of the space X and assign a value of Y to each class of the partition. In regression : q X cj 1Nj (x) E[Y |X = x] = j=1

b cj =

X 1 Yi Card{i; xi ∈ Nj } i;xi ∈Nj

In Classification : Y discrete having J levels b cj = The most frequent class in Nj (x)

General framework : linear or convex combination of non linear functions R. Fraiman, B. Ghattas, M. Svarc Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Example

Wind < 6.6

33.2

200

61.8

150

90.6

100

Solar.R < 153 90.61 19.2

50

Wind < 8.9 19.22

61.81

0

Solar.R

250

300

|

5

10

15

33.20

20

Wind

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

2 stages: Maximal Tree and Pruning All the observations are in the root node.

Splitting rule: one variable and a threshold. How to do ? Use the deviance to measure the heterogeneity of a node: X R(t) = (yn − y¯ (t))2 xn ∈t

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Optimal Splits: minimize the children’s deviance Minimize total new nodes Heterogeneity. Let s be a split of the form: x m < a, ∆R (s, t) = R(t) − (R(tL ) + R(tR )) ≥ 0 ∆R (s, t) = maxs∈Σ ∆R (s, t) In classification, R(t) = −

X

pj (t)log(pj (t))

j∈J

where pj (t) prior probability for each class j in t. R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Extensions



Oblique CART, OC1.



Multivariate CART in regression and classification



Multiple Regression within each node.



Bayesian Cart.



Bootstrap within nodes (Direct approach of instability).

And somehow CURT.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

What is CURT We introduce a new clustering method based on Classification and Regression Trees. It is done in two stages. Forward The sample is first split recursively in two subsamples reducing the heterogeneity of the data within the new subsamples.

Backward The obtained tree is then pruned. As pruning aggregates only adjacent nodes we implemented a third stage where similar clusters are joined even if they do not descend from the same node.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Notations Denote X ∈ Rp a random p-dimensional real vector such that E(kX k2 ) < ∞, and x one realization of X whose p coordinates are x(j), j = 1..p. The data are constituted of n random independant and identically distributed realizations of X . A subset t ⊂ Rp corresponds also to a node of the constructed tree, which corresponds also to a sub sample of the data at hand. the root node corresponds to Rp and contains thus the whole available sample. Our algorithm is similar in spirit to the one employed by CART with two main differences: ◮ ◮

Unsupervised case Alternative pruning is proposed combined with ”Joining”.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Maximal tree Let t be a node containing a set of observations coming from the sample at hand. Let tl , tr be: tl = {x ∈ Rp : x(i) ≤ a} and tr = {x ∈ Rp : x(i) > a} Let Xt = X |{X ∈ t}, αt = P(X ∈ t) and R(t) an heterogeneity measure for t defined by: R(t) = αt trace(Cov (Xt )). (1) R(t) is called the deviance. The best split for t is defined as the couple (j, a) ∈ {1..p} × R maximizing: ∆(t, j, a) = R(t) − R(tl ) − R(tr ) It is easy to check that as for the criterion used in CART, for all t, j, a, ∆(t, j, a) ≥ 0. R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Recursive partitioning Beginning from the whole sample, each node is split recursively until one of the following stopping rules is satisfied: ◮ ◮ ◮

All the observations within a node are the same. There are less than minsize observations in a node. The deviance reduction is less than mindev × R((S)), where S is the whole sample.

minsize = 5, mindev = 0.01. Each node created is assigned a class number. The obtained tree corresponds to a partition of the data set where a cluster is associated to each leaf of the tree.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Maximal tree Model M1: minsize=10, minsplit=25, mindev=0.01

x1 < 2.72 |

x1 < −1.836

x2 < −0.09189

x2 < 4.982

x2 < 0.0392

x1 < 4.534

x2 < 5.858

x1 < −3.821 x1 < −0.1879 x2 < 0.7682 x1 < 4.979x1 < 5.114 42 11 12

x1 < 0.3581 25 33 25 28 17

22 17 13 25

20 10

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Pruning Let T0 = Tmax . For any tree T we define its deviance as: R(T ) =

1X R(t) n ˜ t∈T

where T˜ is the set of the leaves of T . Our objective is to find an optimal subtree of Tmax of size k. Let tl and tr a pair of nodes sharing the same direct ascendant. Define: Z δ qα (Wlr )dα, Wlr = D(Xtl , Xtr ) and ∆lr = 0

where qα (.) stands for the quantile function, δ ∈ [0, 1] and D is a dissimilarity measure between two sets defined later. We prune the tree replacing the pair of branches ti , tj by their ascendant if ∆ij < ǫ1 , i.e. we replace ti and tj by ti ∪ tj in the partition. R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Computing the dissimilarity Let nl (resp. nr ) be the size of tl (resp. tr ). Consider for all xi ∈ tl and yj ∈ tr , the two sequences: d˜l = miny ∈tl d(xi , y),

d˜r = minx∈tr d(x, yj )

and their ordered version denoted dl and dr . For δ ∈ [0, 1], let δn 1 Xi d¯lδ = dl , δni 1

δnj 1 X d¯rδ = dr . δnj 1

We compute the dissimilarity between tl and tr as: d δ (l, r ) = d δ (tl , tr ) = max(d¯lδ , d¯rδ ), and at each step of the algorithm the tree is thus pruned at the ascendant node of tl and tr if d δ (l, r ) ≤ ǫ1 . where ǫ1 > 0. Two parameters: δ and ǫ1 ( ”mindist”). R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Joining Aggregate nodes which do not necessarily share the same direct ascendant. For all pairs of terminal nodes ti and tj : dij =

R(ti ∪ tj ) − R(ti ) − R(tj ) R(ti ∪ tj )

Terminal nodes are successively aggregated in decreasing order of dij , at each step getting one cluster less. If k is known repeat the following step until m ≤ k: ◮

For each pair of values (i, j) , 1 ≤ i < j ≤ m, let (˜i, ˜j) = arg mini,j {dij }. Replace t˜i and t˜j by its union t˜i ∪ t˜j , put m = m − 1 and proceed going on.

If k is unknown, ◮

if d˜i˜j < η replace t˜i and t˜j by its union t˜i ∪ t˜j where η > 0 is a given constant and continue until this condition it is not fulfilled. R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Pruning Model M1: after pairwise pruning

Model M1: after joining

6

4

6

12

1

2

2

100

2

100

x2

x2

4

3

4

38

−2

−2

0

3

0

50

−4

0

2

4

6

x1

R. Fraiman, B. Ghattas, M. Svarc

−4

0

2

4

6

x1

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Pruning

4 x2 2 0 −2

−2

0

2

x2

4

6

Model M1: CART pruning+ joining

6

Model M1: CART pruning

−4

0

2

4

6

x1

R. Fraiman, B. Ghattas, M. Svarc

−4

0

2

4

6

x1

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

CURT and k-means Consider the case where there are “nice groups” strictly separated. More precisely, let A1 , . . . , Ak be disjoint connected compact sets on Rp such that Ai = A0i for i = 1, . . . , k, and {Pi : i = 1, . . . , k} probability measures on Rp with supports {Ai : i = 1, . . . , k} . This will be typically the case if we consider a random vector X ∗ with density f and we consider the random vector X = X ∗ |{f > δ} for a positive level set δ, as in several hierarchical clustering procedures. An admissible family for CURT will be a family of sets A1 , . . . , Ak such that there exist another family of disjoint sets B1 , . . . , Bk build up as the intersection of a finite number of half–spaces delimited by hyperplanes which are orthogonal to the coordinate axis satisfying Ai ⊂ Bi .

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

CURT and k-means ... On the other hand, although k-means is defined through the vector of centers (c1 . . . , ck ) minimizing   E min kX − cj k . j=1,...,k

Associated with each center cj is the convex polyhedron Sj of all points in Rp closer to cj than to any other center, called the Voronoi cell of cj . The sets in the partition S1 , . . . , Sk are the population clusters for k–means. Therefore, the population clusters for k–means are defined by exactly k hyperplanes in an arbitrary position. So, an admissible family for k-means will be a family of sets A1 , . . . , Ak that can be separated by exactly k hyperplanes. While the hyperplanes for k-means can be in general position, we are not able to use more than k of them. It is clear that CURT is in this sense much more flexible, since the family of admissible sets is more general. For instance, k-means will necessarily fail to identify nested groups, while this will not be the case of CURT. CURT is less sensitive to small changes in the parameters defining the partition, and kmeans is sensitive to a small change in the centers (c1 . . . , ck ). R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

R(t) A simple equivalent characterization of the function R(t) is given in the following Lemma.

Lemma Let tl and tr be disjoint compact sets on Rp and denote by µs = E(Xts ), s = l, r respectively. If t = tl ∪ tr we have that, R(t) = R(tl ) + R(tr ) + 2

R. Fraiman, B. Ghattas, M. Svarc

αtl αtr kµl − µr k2 . αt

(2)

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Theorem Assume that the random vector X has distribution P and a density f fulfilling that kxk2 f (x) is bounded. Let X1 , . . . , Xn iid random vectors with the same distribution as X and denote by Pn the empirical distribution of the sample X1 , . . . , Xn . Let {t1n , . . . , tmn n } be the empirical binary partition obtained by the forward empirical algorithm, and {t1 , . . . , tm } the population version. Then, we have that mn = m ultimately and each pair (ijn , ajn ) ∈ {1, . . . , p} × R determining the empirical partition converges a.s. to the corresponding one (ij , aj ) ∈ {1, . . . , p} × R for the population values. In particular, lim

n→∞

m X

P (tin ∆ti ) = 0,

i=1

R. Fraiman, M. Svarc Clustering using Unsupervised Regression Trees: CURT where ∆ stands forB. Ghattas, the symmetric difference.

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Supervised and Unsupervised Learning Clustering a la CART Forward step: maximal tree construction CURT and k-means Consistency of CURT

Theorem ∗ , . . . , t ∗ } be the final empirical binary partition obtained Let {t1n kn n after the forward and backward empirical algorithm and {t1∗ , . . . , tk∗ } the population version. Under the assumptions of the previous Theorem we have that kn = k ultimately (kn = k ∀n if k is known), and

lim

n→∞

k X

P (tin∗ ∆ti∗ ) = 0.

i=1

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

k-means (10) We compare the results with those of k-means. As it is well known the performance of k-means strongly depends on the initial cluster centroid positions, then many authors suggest to consider several random initializations and keep the partition with a lowest value for the objective function   n k X X





Xi − cj 2 I Xi ∈ Gj  .  i=1

j=1

where Gj is the j-th group and cj is the corresponding center. We considered 10 random initializations for each case and denote this version k-means(10). R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

Models for simulation M1. Three groups in dimension 2. The data are generated from the following distributions N(µi , Σ) with µ1 = (−4, 0), µ2 = (0, 0), µ3 = (5, 5) and   0.52 0.6 Σ= . 2 0.6 0.7 M2. Four groups in dimension 2. The data are generated from : N(µi , Σ), i = 1, . . . , 4 distributions with centers (−1, 0), (1, 0), (0, −1), (0, 1) and covariance matrix Σ = σ 2 Id with σ = 0.15. M3. Three groups in dimension 10. The data are generated according to the distributions given in (??) with Σ = σ 2 Id, σ = 0.8, µ1 = (−2, . . . , −2), µ2 = (0, . . . , −0), µ3 = (3, . . . , 3). M4. 10 groups in dimension 5. The data were generated from N(µi , Σ), i = 1, . . . , 10. The first five means µi are have been chosen to be the vectors in the canonical basis e1 , . . . , e5 respectively, while µ5+i = −ei , i = 1, .., 5. The covariance matrix is Σ = σ 2 Id, σ = 0.19. R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

Models for simulation Model M2, sigma=0.15

X2

0.0 −0.5

2 −2

−1.0

0

X2

4

0.5

1.0

6

Model M1

−4

0

4

X1

−1.0

0.5 X1

Figure: Scatter plot corresponding to M1 (left) and M2 (right)

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

Models for simulation Model M4: sigma=0.19

0.0

X2 −4

−1.5

−1.0

−2

−0.5

0

X2

0.5

2

1.0

4

1.5

Model M3: sigma=0.8

−4

−2

0

2

4

−1.0

X1

0.0

1.0

X1

Figure: Two dimensional projection scatter plots corresponding to M3 (left) and M4 (right)

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

We perform M=100 replicates for each model, and we compare the results with k-Means and k-means(10). Throughout the simulations k is assumed to be known. In order to perform CURT we must fix the values for the parameters involved at each stage of the algorithm: ◮

◮ ◮

For the maximal tree we use: minsize = 5, mincut = 10 and mindev = 0.01. For the pruning parameters: ǫ1 = 0.3, 0.5 and δ = 0.4. For the joining stage, as k in known we use it.

Since we know the original clusters in the data, we may measure the performance of each method computing “the number of misclassified observations”. If we denote the original clusters r = 1, . . . , R, the predicted ones s = 1, . . . , S, and the crosstabulation frequencies nrs then the missclassification error we use may be expressed as: R

MCE =

1X {nr . − maxs {nrs }} n r =1

where nr . is the number of observations in cluster r . R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

Table: mindist= 0.3 Sig Model 1 1 Model 2 0.11 0.13 0.15 0.17 0.19 Model 3 0.7 0.75 0.8 0.85 0.9 Model 4 0.15 0.17 0.19 0.21 0.23

max

k

prune

k

83.4

33.4

56.9

15.4

77.2 73.3 67.2 69.1 73.2

39 37.5 34.7 35.4 37.4

0.158 0.775 1.78 5.32 9.18

45.7 43 43.5 47.9 52.8

19 17.9 17.5 19.2 20.7

27.3 25.9 29.7 37.9 43.6

17.9 16 16 19.6 22.4

join

k

pruneC

11

3.01

0.00667

4 4.05 4.13 4.8 5.45

0.158 0.74 1.74 5.04 8.18

4 4 4 4 4

45.7 43 43.5 47.9 52.8

19 17.9 17.5 19.2 20.7

25.6 22.8 24 26.3 30.5

9.19 16.2 26 36.7 43.3

10.3 11.6 14.2 18.9 22.2

8.95 14.3 22.2 29.4 35.9

R. Fraiman, B. Ghattas, M. Svarc

k

kmean

kmean10

3

4.06

0

0.158 0.748 1.74 4.88 8.2

4 4 4 4 4

4.17 4.47 3.08 2.21 1.41

0 0 0 0.0025 0.015

3.85 3.73 3.66 3.92 3.9

1.16 1.68 2.13 2.69 3.36

2.92 2.97 2.99 3 3

4.08 4.76 3.08 3.64 3.46

0 0.00333 0.00333 0.01 0.02

9.82 9.95 9.96 10 10

10.1 13.7 18.7 21.7 22.8

10 10 9.88 9.98 9.97

3.18 2.93 2.39 2.31 2.88

0.0433 0.0667 0.07 0.24 0.83

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

Table: mindist= 0.4 Sig Model 1 1 Model 2 0.11 0.13 0.15 0.17 0.19 Model 3 0.7 0.75 0.8 0.85 0.9 Model 4 0.15 0.17 0.19 0.21 0.23

max

k

83.1

33.5

78.3 75.1 68.8 68.8 75.7

prune

k

join

26.8

7.53

5.31

3

0.0267

40.2 38.2 35.7 35.2 38.5

0.115 0.487 1.81 5.42 11.4

4 4.03 4.2 4.76 5.59

0.115 0.475 1.75 5.04 10.3

4 4 4 4 4

48.2 43.6 43.1 44.8 51.3

20.2 18.1 17.6 17.8 20.2

48.2 43.6 43.1 44.8 51.3

20.2 18.1 17.6 17.8 20.2

25.6 23.1 23 24.1 29.6

26.1 25.8 28.6 36.9 45.2

17.1 15.8 15.7 19.4 23.6

8.64 12.5 18.8 28 40.1

9.58 9.68 10.5 14.4 20.1

8.64 12.5 18.5 26.2 35.3

R. Fraiman, B. Ghattas, M. Svarc

k

pruneC

k

kmean

kmean10

3

4.48

0.00667

0.115 0.51 1.75 4.87 9.51

4 4 4 4 4

4.31 3.93 3.38 2.34 1.42

0 0 0.0025 0.0025 0.02

3.89 3.79 3.92 3.66 3.87

1.16 1.58 2.12 2.96 3.19

2.97 2.95 2.98 3 3

3.06 3.44 3.7 3.57 3.82

0 0 0.00333 0.00667 0.0267

9.58 9.67 9.85 9.98 10

10.4 13.3 17.8 20.7 22.7

10 9.96 9.88 9.98 9.98

3.09 2.51 2.23 2.56 2.53

0.04 0.0533 0.0867 0.287 0.897

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

Table: mindist= 0.5 Sig Model 1 1 Model 2 0.11 0.13 0.15 0.17 0.19 Model 3 0.7 0.75 0.8 0.85 0.9 Model 4 0.15 0.17 0.19 0.21 0.23

max

k

83.6

33.3

77.9 74.6 69.4 69.3 74.5

prune

k

join

6.94

4.07

1.28

3

0.01

40.3 38.2 35.8 35.3 37.6

0.15 0.765 2 5.84 10.3

4 4.04 4.26 4.8 5.33

0.15 0.735 1.92 5.2 9.3

4 4 4 4 4

48.9 44.1 45.6 46 50.6

20.4 18.5 18.3 18.3 20.4

48.9 44.1 45.6 46 50.6

20.4 18.5 18.3 18.3 20.4

27.5 23.7 25.5 25.7 28.4

28.2 26.2 30.3 36.5 45.8

18.3 16 16.4 19.1 23.5

8.01 12.3 20.2 25.3 36.7

9.51 9.6 10.6 13.1 17.7

8.01 12.3 20 24.7 33.8

R. Fraiman, B. Ghattas, M. Svarc

k

pruneC

k

kmean

kmean10

3

3.59

0.00333

0.15 0.715 2 5 8.78

4 4 4 4 4

4.16 4.97 3.45 2.33 1.39

0 0 0.0025 0 0.025

3.92 3.78 3.87 3.77 3.92

1.3 1.57 2.26 2.47 3.29

2.97 2.98 2.99 3 3

3.12 4.52 4.53 3.77 3.16

0 0.00333 0.00333 0.00333 0.00333

9.51 9.57 9.86 9.96 10

9.95 13.1 18.5 20.1 22.5

10 9.98 9.95 9.9 9.91

2.98 3.26 2.65 2.9 2.56

0.117 0.0733 0.0733 0.313 0.86

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

CART and CURT X1 and X2 with distributions given by X1 ∼ N (0, 0.03) , N (2, 0.03) , N (1, 0.25) , X2 ∼ N (0, 0.25) , N (1, 0.25) , N (2.5, 0.03) ,

Figure: CURT: solid lines, CART: the dashed lines.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Simulated datasets Tuning the method Results A comparison between CART and CURT

CART and CURT

(a) Tree corresponding to CURT

(b) Tree considering CART

Figure: In both cases the left branches indicate the smaller value of the partitioning variable

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

(a) Tree structure considering four groups

(b) five groups

Figure: In every case on the left branch we present the smaller values of the variable that is making the partition

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Group 1 Turkey

Group 2 Greece Poland Rumania Yugoslavia

Group 3 Ireland Portugal Spain Bulgaria Czechoslovakia Hungary USSR

Group 4 Belgium Denmark France W. Germany E. Germany Italy Luxembourg Netherlands United Kingdom Austria Norway Sweden Switzerland

Table: CURT clustering structure for four groups.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Group 1 Turkey Greece Poland Rumania Yugoslavia

Group 2 Ireland Portugal Spain Bulgaria Czechoslovakia Hungary USSR

Group 3 Belgium Denmark Netherlands United Kingdom Norway Sweden

Group 4 W. Germany E. Germany Switzerland

Group 5 France Italy Luxembour Austria

Table: CURT clustering structure for five groups.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Conclusions ◮ ◮

◮ ◮

There are no restrictions on the dimension of the data. CURT behaves quite well in the simulated examples that we have considered, as well as in a real data example. The method is consistent under quite general assumptions. A robust version could be developed changing in the objective function given in (1), cov (XT , XT ) by a robust covariance functional robcov (XT , XT ) (see for instance Maronna et al. [2006] Chapter 6, for a review) and then proceed in the same way.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Outline Clustering a la CART Some experiments. A real data example. European Jobs References References

Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman & Hall/CRC, 1984. T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, corrected edition, 2003. Ricardo A. Maronna, Douglas R. Martin, and Victor J. Yohai. Robust Statistics: Theory and Methods (Wiley Series in Probability and Statistics). Wiley, 1 edition, June 2006.

R. Fraiman, B. Ghattas, M. Svarc

Clustering using Unsupervised Regression Trees: CURT

Suggest Documents