42092

# Program to obtain frequency matrix of categorical data

<h3>Question</h3>

I am working on data that contains more than 300 categorical features that I have factored into 0s and 1s. Now, i need to create a matrix of the features to with frequency of joint occurrence in each cell.

In the end , I am looking to create a heatmap of this frequency matrix.

So, my dataframe in R looks like this:

```id cat1 cat2 cat3 cat4 156 0 0 1 1 465 1 1 1 0 573 0 1 1 0 ```

The output I want is:

``` cat1 cat2 cat3 ... cat1 0 1 0 cat2 1 0 2 cat3 1 2 0 . . ```

where each cell value denotes the number of times the two categorical variables have appeared <em>together</em>.

We can use `outer`

```#Since we have only 0's and 1's in column we can directly use & fun <- function(x, y) sum(df[, x] & df[, y]) #Get all the cat columns n <- seq_along(df)[-1] #Apply function to every combination of columns mat <- outer(n, n, Vectorize(fun)) #Turn diagonals to 0 diag(mat) <- 0 #Assign rownames and column names dimnames(mat) <- list(names(df)[n], names(df[n])) # cat1 cat2 cat3 cat4 #cat1 0 1 1 0 #cat2 1 0 2 0 #cat3 1 2 0 1 #cat4 0 0 1 0 ```

we can use `table` with `crossprod` from `base R`

```i1 <- as.logical(unlist(df1[-1])) out <- crossprod(table(df1\$id[row(df1[-1])][i1], names(df1)[-1][col(df1[-1])]. [i1])) diag(out) <- 0 out # cat1 cat2 cat3 cat4 # cat1 0 1 1 0 # cat2 1 0 2 0 # cat3 1 2 0 1 # cat4 0 0 1 0 ``` <h3>data</h3> ```df1 <- structure(list(id = c(156L, 465L, 573L), cat1 = c(0L, 1L, 0L), cat2 = c(0L, 1L, 1L), cat3 = c(1L, 1L, 1L), cat4 = c(1L, 0L, 0L)), class = "data.frame", row.names = c(NA, -3L)) ```