R - Isolate clusters with specific characteristics in hclust


I've used hclust to generate a cluster dendrogram of some data, but I need to isolate all the paired clusters, i.e. all the clusters that comprise just 2 pieces of data (the first ones to be clustered together), even if they might be clustered with other data on a "higher" branch. Does anyone know how I can do that?

I've highlighted the clusters I want to isolate in the attached image, hopefully that explains it better.

<img alt="Dendrogram" class="b-lazy" data-src="https://i.stack.imgur.com/q8glg.gif" data-original="https://i.stack.imgur.com/q8glg.gif" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" />

I'd like to be able to isolate all the paired data in those clusters in such a way to be able to compare the clusters on their contents. For example to see which of them contain a particular type of data.


FWIW, you could extract the "forks" like this:

hc <- hclust(dist(USArrests), "ave") plot(hc)

<a href="https://i.stack.imgur.com/o1qkJ.gif" rel="nofollow"><img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/o1qkJ.gif" data-original="https://i.stack.imgur.com/o1qkJ.gif" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

res <- list() invisible(dendrapply(as.dendrogram(hc), function(x) { if (attr(x, "members")==2) if (all(sapply(x[1:2], is.leaf))) res <<- c(res, list(c(attr(x[[1]], "label"), attr(x[[2]], "label")))) x })) head( do.call(rbind, res) ) # [,1] [,2] # [1,] "Florida" "North Carolina" # [2,] "Arizona" "New Mexico" # [3,] "Alabama" "Louisiana" # [4,] "Illinois" "New York" # [5,] "Michigan" "Nevada" # [6,] "Mississippi" "South Carolina"

(just the first 6 rows of the result)


