9298

Relabel samples in kmean results considering the order of centers

I am using kmeans to cluster my data, for the produced result I have a plan.

I wanted to relabel the samples based on ordered centres. Consider following example :

a = c("a","b","c","d","e","F","i","j","k","l","m","n") b = c(1,2,3,20,21,21,40,41,42,4,23,50) mydata = data.frame(id=a,amount=b) result = kmeans(mydata$amount,3,nstart=10)

Here is the result :

clus$cluster 2 2 2 3 3 3 1 1 1 2 3 1 clus$centers 1 43.25 2 2.50 3 21.25 mydata = data.frame(mydata,label =clus$cluster) mydata id amount label 1 a 1 2 2 b 2 2 3 c 3 2 4 d 20 3 5 e 21 3 6 F 21 3 7 i 40 1 8 j 41 1 9 k 42 1 10 l 4 2 11 m 23 3 12 n 50 1

What I am looking for is sorting the centres and producing the labels accordingly:

1 2.50 2 21.25 3 43.25

and label the samples going to:

1 1 1 2 2 2 3 3 3 1 2 3

and the result should be :

id amount label 1 a 1 1 2 b 2 1 3 c 3 1 4 d 20 2 5 e 21 2 6 F 21 2 7 i 40 3 8 j 41 3 9 k 42 3 10 l 4 1 11 m 23 2 12 n 50 3

I think it is possible to do it by, order the centres and for each sample taking the index of minimum distance of samples with centres as the label of that cluster.

Is there another way that R can do it automatically ?

Answer1:

One idea is to create a named vector by matching your centers with the sorted centers. Then match the vector with mydata$label and replace with the names of the vector, i.e.

i1 <- setNames(match(sort(result$centers), result$centers), rownames(result$centers)) as.numeric(names(i1)[match(mydata$label, i1)]) # [1] 1 1 1 2 2 2 3 3 3 1 2 3

Answer2:

You can use for loop, if you don't mind loops

cls <- result$cluster for (i in 1 : length(result$cluster)) result$cluster[cls == order(result$centers)[i]] <- i result$cluster #[1] 1 1 1 2 2 2 3 3 3 1 2 3

Recommend

  • Java - rounding by quarter intervals
  • Using nativescript converters
  • Understanding format of data in scikit-learn
  • Using custom HTML attributes for JavaScript purposes? [closed]
  • Neo4j 2.0 dump with double type
  • Chartjs display label & units when mouse is hover stats
  • Parameterizing labels
  • Error in dev.off() : cannot shut down device 1 (the null device)
  • Load LibSVM's Model File in Matlab
  • Dygraphs: JS function working for axisLabelFormatter but not for valueFormatter
  • How to include highcharts motion plugin for bubble plot using R wrapper?
  • container engine kubernetes and ssl
  • How to sort factor levels based on another category?
  • Make line chart with values and dates
  • Pandas Dataframe ordering and sorting of column values
  • Loading RTF file having Table in TRichEdit without Table borders
  • Can't get data to load into jTable in mvc 4
  • How to attach data to TreeItem in SWT/Java?
  • How to reorder cells under UITests?
  • Display name for nested complex type
  • ggplot2: Plotting regression lines with different intercepts but with same slope
  • How to Wrap words in UILabel
  • Docker volume mount doesn't exist
  • d3.js - Tree Layout - How can I flip it?
  • R - Phylogram labels to vector
  • multiple colors on beanplot in R
  • How to test labels in QTP
  • How to get the probabilities of classes in Spark Naive Bayes classifier?
  • how to set space Between Labels of xAxis
  • Add items to ComboBox at runtime?
  • dm-script catch error with “Analyze Particles”
  • Neo4j Cypher query performance optimization
  • Neo4j: Legacy Indexes and auto index vs new label bases schema indexes
  • PyQt4 application on Windows is crashing on exit
  • Insertion large number of Entities into SQL Server 2012 [duplicate]
  • Time out Error in send mail
  • Activation Function choice for Neural network
  • ApplePay PKPaymentAuthorizationViewController always shows processing
  • angularjs unit test when to use $rootScope.$new()
  • How to model a transition system with SPIN