9298

# Relabel samples in kmean results considering the order of centers

I am using kmeans to cluster my data, for the produced result I have a plan.

I wanted to relabel the samples based on ordered centres. Consider following example :

```a = c("a","b","c","d","e","F","i","j","k","l","m","n") b = c(1,2,3,20,21,21,40,41,42,4,23,50) mydata = data.frame(id=a,amount=b) result = kmeans(mydata\$amount,3,nstart=10) ```

Here is the result :

```clus\$cluster 2 2 2 3 3 3 1 1 1 2 3 1 clus\$centers 1 43.25 2 2.50 3 21.25 mydata = data.frame(mydata,label =clus\$cluster) mydata id amount label 1 a 1 2 2 b 2 2 3 c 3 2 4 d 20 3 5 e 21 3 6 F 21 3 7 i 40 1 8 j 41 1 9 k 42 1 10 l 4 2 11 m 23 3 12 n 50 1 ```

What I am looking for is sorting the centres and producing the labels accordingly:

```1 2.50 2 21.25 3 43.25 ```

and label the samples going to:

```1 1 1 2 2 2 3 3 3 1 2 3 ```

and the result should be :

``` id amount label 1 a 1 1 2 b 2 1 3 c 3 1 4 d 20 2 5 e 21 2 6 F 21 2 7 i 40 3 8 j 41 3 9 k 42 3 10 l 4 1 11 m 23 2 12 n 50 3 ```

I think it is possible to do it by, order the centres and for each sample taking the index of minimum distance of samples with centres as the label of that cluster.

Is there another way that R can do it automatically ?

One idea is to create a named vector by matching your centers with the sorted centers. Then match the vector with `mydata\$label` and replace with the names of the vector, i.e.
```i1 <- setNames(match(sort(result\$centers), result\$centers), rownames(result\$centers)) as.numeric(names(i1)[match(mydata\$label, i1)]) # [1] 1 1 1 2 2 2 3 3 3 1 2 3 ```
You can use `for` loop, if you don't mind loops
```cls <- result\$cluster for (i in 1 : length(result\$cluster)) result\$cluster[cls == order(result\$centers)[i]] <- i result\$cluster #[1] 1 1 1 2 2 2 3 3 3 1 2 3 ```