26059

match.fun slower than actual function in R

Question:

I have large data sets with rows that measure the same thing (essentially duplicates with some noise). As part of a larger function I am writing, I want the user to be able to collapse these rows with a function of their choosing (e.g. mean, median).

My problem is that if I call the function directly, speed is much faster than if I use match.fun (which is what I need). MWE:

require(data.table) rows <- 100000 cols <- 1000 dat <- data.table(id=sample(LETTERS, rows, replace=TRUE), matrix(rnorm(rows*cols), nrow=rows)) aggFn <- "median" system.time(dat[, lapply(.SD, median), by=id]) system.time(dat[, lapply(.SD, match.fun(aggFn)), by=id])

On my system, timing results for the last 2 lines:

user system elapsed 1.112 0.027 1.141 user system elapsed 2.854 0.265 3.121

This becomes quite dramatic with larger data sets.

As a final point, I realize aggregate() can do this (and doesn't seem to suffer from this behavior), but I need to work with data.table objects due to data size.

Answer1:

The reason is the gforce optimization data.table does for median. You can see that if you set options(datatable.verbose=TRUE). See help("GForce") for details.

If you compare for other functions you get more similar timings:

fun <- median aggFn <- "fun" system.time(dat[, lapply(.SD, fun), by=id]) system.time(dat[, lapply(.SD, match.fun(aggFn)), by=id])

A possible workaround to utilise the optimization if the function happens to be supported would be evaluating an expression build with it, e.g., using the dreaded eval(parse()):

dat[, eval(parse(text = sprintf("lapply(.SD, %s)", aggFn))), by=id]

However, you would lose the small security using match.fun adds.

If you have a list of functions the users can choose from, you could do this:

funs <- list(quote(mean), quote(median)) fun <- funs[[1]] #select expr <- bquote(lapply(.SD, .(fun))) a <- dat[, eval(expr), by=id]

Recommend

  • head and tail doesn't take negative number as argument for data.table?
  • Non alpha character arrowlabels on a diagram
  • How to do row-wise subtraction and replace a specific number with zero?
  • Splitting numbers and letters in SQL Server 2005 table
  • jQuery & CSS - Cut text by height, no truncate
  • How to count spring coil turns?
  • Grails Packaging and Naming Conventions
  • c#.NET USB device persistent identifier
  • Why is this code not working? Hangman
  • Converting a data frame into named object in R
  • Referring to individual variables in … with dplyr quos
  • How to use ctype_alpha with UTF-8
  • Convert data type in R or Python
  • cannot be assigned to — it is read only - C#
  • Encoding: everything is UTF-8 but the DB output is displayed wrong. Any Ideas?
  • substitute period from abbreviation (single letter + period) unless followed by a capital letter
  • How to model a mixture of finite components from different parametric families with JAGS?
  • Ruby regex to remove all consecutive letters from string
  • Refresh other frame, from another frame. Jquery
  • Add spaces between words in spaceless string
  • Pointer vs Reference difference when passing Eigen objects as arguments
  • Layout design help Android
  • Not able to display correct data in table -AngularJS
  • Geom_jitter colour based on values
  • quiver not drawing arrows just lots of blue, matlab
  • SAXReader not re-ecape characters
  • Checking free space on FTP server
  • Knitr HTML Loop - Some HTML output, some R output
  • How do I use the BLAS library provided by MATLAB?
  • Akka Routing: Reply's send to router ends up as dead letters
  • R: gsub and capture
  • AT Commands to Send SMS not working in Windows 8.1
  • sending mail using smtp is too slow
  • Busy indicator not showing up in wpf window [duplicate]
  • Sorting a 2D array using the second column C++
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • Observable and ngFor in Angular 2
  • How can I use `wmic` in a Windows PE script?
  • UserPrincipal.Current returns apppool on IIS
  • java string with new operator and a literal