62043

control count of cluster usinig apcluster lib in R

Question:

Using apcluster, i faced with new problem. Apcluster self selects number of clusters. Here my data (indeed in contex of this question, this dput() example doesn't matter).

mydata=structure(list(AV_COST_SKU_ALLTHETIME = c(4687.25, 4687.25, 7255.083, 7255.083, 4084.524, 4084.524, 4452.692, 4452.692, 13951.25, 13951.25, 6855, 6855, 3943.2, 3943.2, 7261.625, 7261.625, 4082.167), AV_COST_SKU_PER_DELTA = c(0, 0, -1, -1, 0.150505, 0.150505, -1, -1, -1, -1, 0, 0, -0.17534, -0.17534, -0.072866, -0.072866, 0), AV_SKU_LAST_4SEASON = c(0, 0, 0, 0, 1.333, 1.333, 0, 0, 0, 0, 0, 0, 1.667, 1.667, 1.333, 1.333, 0), AV_SKU_PREVIOUS_4SEASON = c(0, 0, 2, 2, 2.308, 2.308, 3.25, 3.25, 1, 1, 0, 0, 2.5, 2.5, 1.333, 1.333, 0), AV_SKU_ALLTHETIME = c(1.333, 1.333, 1.714, 1.714, 2.1, 2.1, 3.25, 3.25, 1.333, 1.333, 1.333, 1.333, 2, 2, 1.333, 1.333, 2), AV_SKU_PER_DELTA = c(0, 0, -1, -1, -0.267784, -0.267784, -1, -1, -1, -1, 0, 0, -0.199904, -0.199904, 0, 0, 0)), .Names = c("AV_COST_SKU_ALLTHETIME", "AV_COST_SKU_PER_DELTA", "AV_SKU_LAST_4SEASON", "AV_SKU_PREVIOUS_4SEASON", "AV_SKU_ALLTHETIME", "AV_SKU_PER_DELTA"), class = "data.frame", row.names = c(NA, -17L))

Say real dataset has 80000 rows and 40 variables.

The matter of my question divided on two aspects 1. Can i set max number of clusters?

apres <- apclusterL(negDistMat(r=2), a, frac=0.2, sweeps=3, p=-0.2,q=0,5,lam=0.5,maxits=10) a=head(mydata, n=80000L)

as result 16000 clusters. This for further analysis does not suit me. Can i set form not more that 40 clusters?

The second aspect. Is it possible to do that obs in clusters was distributed uniform? Namely suppose i set 5 cluster and have 100 obs, ideally that in each cluster was approximately 15-25 obs. but not that in first cluster we have 90 obs, and in another four clusters we have only two obs in each.

Answer1:

You can't set a hard limit.

But by varying the value on the diagonal, you can get more or fewer clusters.

Recommend