36546

# Is there any way to bind data to data.frame by some index?

```#For say, I got a situation like this user_id = c(1:5,1:5) time = c(1:10) visit_log = data.frame(user_id, time) #And I've wrote a method to calculate interval interval <- function(data) { interval = c(Inf) for (i in seq(1, length(data\$time))) { intv = data\$time[i]-data\$time[i-1] interval = append(interval, intv) } data\$interval = interval return (data) } #But when I want to get intervals by user_id and bind them to the data.frame, #I can't find a proper way #Is there any method to get something like new_data = merge(by(visit_log, INDICE=visit_log\$user_id, FUN=interval)) #And the result should be user_id time interval 1 1 1 Inf 2 2 2 Inf 3 3 3 Inf 4 4 4 Inf 5 5 5 Inf 6 1 6 5 7 2 7 5 8 3 8 5 9 4 9 5 10 5 10 5 ```

We can replace your loop with the `diff()` function which computes the differences between adjacent indices in a vector, for example:

```> diff(c(1,3,6,10)) [1] 2 3 4 ```

To that we can prepend `Inf` to the differences via `c(Inf, diff(x))`.

The next thing we need is to apply the above to each `user_id` individually. For that there are many options, but here I use `aggregate()`. Confusingly, this function returns a data frame with a `time` component that is itself a matrix. We need to convert that matrix to a vector, relying upon the fact that in R, columns of matrices are filled first. Finally, we add and `interval` column to the input data as per your original version of the function.

```interval <- function(x) { diffs <- aggregate(time ~ user_id, data = x, function(y) c(Inf, diff(y))) diffs <- as.numeric(diffs\$time) x <- within(x, interval <- diffs) x } ```

Here is a slightly expanded example, with 3 time points per user, to illustrate the above function:

```> visit_log = data.frame(user_id = rep(1:5, 3), time = 1:15) > interval(visit_log) user_id time interval 1 1 1 Inf 2 2 2 Inf 3 3 3 Inf 4 4 4 Inf 5 5 5 Inf 6 1 6 5 7 2 7 5 8 3 8 5 9 4 9 5 10 5 10 5 11 1 11 5 12 2 12 5 13 3 13 5 14 4 14 5 15 5 15 5 ```