10924

# Finding Cumulative Sum In R Using Conditions

I need to create a new variable with the sum of the past three years' amounts for each ID.

If there are not three years' worth of data, there should be an 'NA'.

As an example:

```ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 2 2014 6 2 2015 9 3 2012 4 3 2013 7 3 2014 2 3 2015 3 ```

Here's what the result should be:

```ID YEAR AMOUNT THREE_YR 1 2010 5 NA 1 2011 2 NA 1 2012 4 11 1 2013 1 7 1 2014 3 8 2 2013 4 NA 2 2014 6 NA 2 2015 9 19 3 2012 4 NA 3 2013 7 NA 3 2014 2 13 3 2015 3 12 ```

How would I do this? Thanks!

We can use functions from `dplyr` and `zoo`. `dt2` is the final output.

```# Create example data frame dt <- read.table(text = "ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 2 2014 6 2 2015 9 3 2012 4 3 2013 7 3 2014 2 3 2015 3", header = TRUE, stringsAsFactors = FALSE) # Load packages library(dplyr) library(zoo) # Process the data dt2 <- dt %>% group_by(ID) %>% mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right")) ```

### Update: ID groups with less than 3 records.

The OP asked what to do if there are IDs with only one or two rows. Honestly, I did not find a good way to solve this. The only thing I can think of is dividing the original data frame to two groups, apply the `rollsum` to the group with all records larger than or equal to three. After that, combine all groups.

```# Create example data frame dt <- read.table(text = "ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 3 2012 4 3 2013 7 3 2014 2 3 2015 3", header = TRUE, stringsAsFactors = FALSE) # Load packages library(dplyr) library(zoo) # Process the data dt2 <- dt %>% group_by(ID) %>% filter(n() >= 3) %>% mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right")) %>% bind_rows(dt %>% group_by(ID) %>% filter(n() < 3)) %>% arrange(ID, YEAR) ```

With the `data.table`:
```library(data.table) setDT(dt) setorder(dt,YEAR) dt[,.(YEAR,AMOUNT,THREE_YR=AMOUNT+shift(AMOUNT,1)+shift(AMOUNT,2)),by=.(ID)] #ID YEAR AMOUNT THREE_YR # 1: 1 2010 5 NA # 2: 1 2011 2 NA # 3: 1 2012 4 11 # 4: 1 2013 1 7 # 5: 1 2014 3 8 # 6: 3 2012 4 NA # 7: 3 2013 7 NA # 8: 3 2014 2 13 # 9: 3 2015 3 12 #10: 2 2013 4 NA #11: 2 2014 6 NA #12: 2 2015 9 19 ```
Using `zoo::rollapplyr()` and `aggregate()` This will return `NA` if there are less than three members in a group.
```x <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), YEAR = c(2010L, 2011L, 2012L, 2013L, 2014L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L, 2015L), AMOUNT = c(5L, 2L, 4L, 1L, 3L, 4L, 6L, 9L, 4L, 7L, 2L, 3L)), .Names = c("ID", "YEAR", "AMOUNT" ), class = "data.frame", row.names = c(NA, -12L)) library(zoo) rsum <- aggregate(AMOUNT ~ ID, data=x, FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE, FUN=function(y) if (length(y) >= 3) sum(y) else NA)) x\$rsum <- do.call(c, rsum\$AMOUNT) x # ID YEAR AMOUNT rsum # 1 1 2010 5 NA # 2 1 2011 2 NA # 3 1 2012 4 11 # 4 1 2013 1 7 # 5 1 2014 3 8 # 6 2 2013 4 NA # 7 2 2014 6 NA # 8 2 2015 9 19 # 9 3 2012 4 NA # 10 3 2013 7 NA # 11 3 2014 2 13 # 12 3 2015 3 12 # remove one of the 2s x <- x[-6, ] rsum <- aggregate(AMOUNT ~ ID, data=x, FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE, FUN=function(y) if (length(y) >= 3) sum(y) else NA)) x\$rsum <- do.call(c, rsum\$AMOUNT) x # ID YEAR AMOUNT rsum # 1 1 2010 5 NA # 2 1 2011 2 NA # 3 1 2012 4 11 # 4 1 2013 1 7 # 5 1 2014 3 8 # 7 2 2014 6 NA # 8 2 2015 9 NA # 9 3 2012 4 NA # 10 3 2013 7 NA # 11 3 2014 2 13 # 12 3 2015 3 12 ```