48163

How to Score on a new Data Set

Question:

We have built models in R for Clustering. We now want the equation of the model to be deployed for the new customers whom we want to Cluster. In SAS, the Cluster node used to provide a Clustering SAS code where we only had to to plug the new input variables. Is there a way to do that in R? How can we export the Cluster equation?

An example of the same is as below using the standard iris dataset.

irisnew <- iris library("cluster", lib.loc="~/R/win-library/3.2") (kc <- kmeans(irisnew, 3)) K-means clustering with 3 clusters of sizes 62, 38, 50 Cluster means: Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.901613 2.748387 4.393548 1.433871 2 6.850000 3.073684 5.742105 2.071053 3 5.006000 3.428000 1.462000 0.246000 Clustering vector: [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [39] 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [77] 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2 2 1 [115] 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 1 Within cluster sum of squares by cluster: [1] 39.82097 23.87947 15.15100 (between_SS / total_SS = 88.4 %)

Now that the Cluster is defined, i have a new dataset for petals that I need to classify according to the above clustering rules. My Question is how do i export the rules do that? Typically the rules are defined as

x = a1 * Sepal.Length + a2 * Sepal.Width +a3 * Petal.Length + a4 * Petal.Width + b Then if x between z1 and z2 then Cluster1 else if x between z3 and z4 then Cluster2 else if x between z5 and z6 then Cluster3 else Cluster4

Thanks, Manish

Answer1:

For Generic Models Use - predict.glm(glm.model, newdata = newdf))

For clustering Use - <strong><a href="https://stackoverflow.com/questions/20621250/simple-approach-to-assigning-clusters-for-new-data-after-k-means-clustering" rel="nofollow">Simple approach to assigning clusters for new data after k-means clustering</a></strong>

Recommend

  • How do I get different icons for the two versions of my GPS app generated via gradle build variants?
  • K-means cluster plot [closed]
  • Image Filter which uses the highest occurence of pixel values
  • Converting MATLAB code to Java code
  • Make silhouette plot legible for k-means
  • BigQuery - Clustered tables not reducing query size with multiple keys
  • Inserting special characters
  • Hybrid SOM (with MLP)
  • Choosing number of clusters in k means
  • Truncate dictionary list values
  • How to set spark.driver.memory for Spark/Zeppelin on EMR
  • What is the fastest way to update the whole document (all fields) in MongoDB?
  • Errors building R-packages for conda
  • Android Studio memory usage Ubuntu 16
  • OSX Installing Rsymphony - linking headers and libs
  • NSDate isMemberOfClass: [NSDate class] returns false? [duplicate]
  • What icons required for app to submit to MAC App Store?
  • RAdwords error (“server certificate verification failed”)
  • How to declare a typescript property that implements multiple interfaces
  • Submitting two different forms with an external Submit button not working properly
  • ube error: _mm_aeskeygenassist_si128 intrinsic requires at least -xarch=aes
  • Python Multiple file writing question
  • Dendrogram or Other Plot from Distance Matrix
  • Cluster markers with osmdroid
  • Spring Web Security locks Neo4j embedded database
  • Multiple Layouts Previews for Android in Eclipse
  • What going wrong in using PropertiesConfiguration?
  • C# where to add a method
  • Left fixed columns with table colspan
  • How to remove comma or any characters from Python dataframe column name
  • how do i write assembly code from c#?
  • How to revert to previous XCode version?
  • error importing numpy
  • Very simple C++ DLL that can be called from .net
  • Does CUDA 5 support STL or THRUST inside the device code?
  • When should I choose bucket sort over other sorting algorithms?
  • How do you troubleshoot character encoding problems?
  • Hibernate gives error error as “Access to DialectResolutionInfo cannot be null when 'hibernate.
  • retrieve vertices with no linked edge in arangodb
  • Understanding cpu registers