Get a user-defined function work in data.table


I would like to know how to pass a user-defined function in a data.table.

I created the following code using data.table to calculate % of responses 'b' out of all valid responses ('a' or 'b') by two groups; grp1 and grp2:

The data (with a warning message):

library(data.table) dt = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")), rep(c("a", "a", "b", "b", "b"), 20)) colnames(dt) = c("grp1", "grp2", "Q1")

The code to calculate % respondents:

dt[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2][order(grp1, grp2)]

This produces what I need (thanks @Frank your help at <a href="https://stackoverflow.com/questions/42789991/calculate-respondents-by-more-than-one-group-for-a-survey-data" rel="nofollow">Calculate % respondents by more than one group for a survey data</a>):

grp1 grp2 V1 1: I A 55.55556 2: I B 62.50000 3: I C 62.50000 4: II A 62.50000 5: II B 55.55556 6: II C 62.50000 7: III A 50.00000 8: III B 62.50000 9: III C 66.66667 10: IV A 66.66667 11: IV B 62.50000 12: IV C 50.00000

What I would like to do is to create a function and use it to calculate the equivalent set of values for 50 other items. I created the following function hoping to minimize the repetitive process;

test = function(question, groupA, groupB){ dt[, sum(get(question) %in% "b")/sum(!is.na(get(question)))*100, by = eval((c(groupA, groupB)))][order(groupA, groupB)] } test(question = "Q1", groupA = "grp1", groupB ="grp2")

However, this returns only the top row :

grp1 grp2 V1 1: I A 55.55556

I've read other items on Stack Overflow (e.g. <a href="https://stackoverflow.com/questions/9705488/using-data-table-i-and-j-arguments-in-functions" rel="nofollow">Using data.table i and j arguments in functions</a>) and tried other codes but I haven't been able to find a way to get it work.

I'm new to R and would very much appreciate any feedback you may have.


The issue is in the way you specify the by argument. Also we can use keyby instead of by, to do the sorting in one step:

test = function(question, groupA, groupB){ dt[, sum(get(question) %in% "b") / sum(!is.na(get(question))) * 100, keyby = c(groupA, groupB)] } ans = test(question = "Q1", groupA = "grp1", groupB ="grp2") # grp1 grp2 V1 # 1: I A 55.55556 # 2: I B 62.50000 # 3: I C 62.50000 # 4: II A 62.50000 # 5: II B 55.55556 # 6: II C 62.50000 # 7: III A 50.00000 # 8: III B 62.50000 # 9: III C 66.66667 # 10: IV A 66.66667 # 11: IV B 62.50000 # 12: IV C 50.00000


