Finding Cumulative Sum In R Using Conditions

I need to create a new variable with the sum of the past three years' amounts for each ID.

If there are not three years' worth of data, there should be an 'NA'.

As an example:

ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 2 2014 6 2 2015 9 3 2012 4 3 2013 7 3 2014 2 3 2015 3

Here's what the result should be:

ID YEAR AMOUNT THREE_YR 1 2010 5 NA 1 2011 2 NA 1 2012 4 11 1 2013 1 7 1 2014 3 8 2 2013 4 NA 2 2014 6 NA 2 2015 9 19 3 2012 4 NA 3 2013 7 NA 3 2014 2 13 3 2015 3 12

How would I do this? Thanks!

Answer1:

We can use functions from dplyr and zoo. dt2 is the final output.

# Create example data frame dt <- read.table(text = "ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 2 2014 6 2 2015 9 3 2012 4 3 2013 7 3 2014 2 3 2015 3", header = TRUE, stringsAsFactors = FALSE) # Load packages library(dplyr) library(zoo) # Process the data dt2 <- dt %>% group_by(ID) %>% mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right"))

Update: ID groups with less than 3 records.

The OP asked what to do if there are IDs with only one or two rows. Honestly, I did not find a good way to solve this. The only thing I can think of is dividing the original data frame to two groups, apply the rollsum to the group with all records larger than or equal to three. After that, combine all groups.

# Create example data frame dt <- read.table(text = "ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 3 2012 4 3 2013 7 3 2014 2 3 2015 3", header = TRUE, stringsAsFactors = FALSE) # Load packages library(dplyr) library(zoo) # Process the data dt2 <- dt %>% group_by(ID) %>% filter(n() >= 3) %>% mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right")) %>% bind_rows(dt %>% group_by(ID) %>% filter(n() < 3)) %>% arrange(ID, YEAR)

Answer2:

With the data.table:

library(data.table) setDT(dt) setorder(dt,YEAR) dt[,.(YEAR,AMOUNT,THREE_YR=AMOUNT+shift(AMOUNT,1)+shift(AMOUNT,2)),by=.(ID)] #ID YEAR AMOUNT THREE_YR # 1: 1 2010 5 NA # 2: 1 2011 2 NA # 3: 1 2012 4 11 # 4: 1 2013 1 7 # 5: 1 2014 3 8 # 6: 3 2012 4 NA # 7: 3 2013 7 NA # 8: 3 2014 2 13 # 9: 3 2015 3 12 #10: 2 2013 4 NA #11: 2 2014 6 NA #12: 2 2015 9 19

Answer3:

Using zoo::rollapplyr() and aggregate() This will return NA if there are less than three members in a group.

x <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), YEAR = c(2010L, 2011L, 2012L, 2013L, 2014L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L, 2015L), AMOUNT = c(5L, 2L, 4L, 1L, 3L, 4L, 6L, 9L, 4L, 7L, 2L, 3L)), .Names = c("ID", "YEAR", "AMOUNT" ), class = "data.frame", row.names = c(NA, -12L)) library(zoo) rsum <- aggregate(AMOUNT ~ ID, data=x, FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE, FUN=function(y) if (length(y) >= 3) sum(y) else NA)) x$rsum <- do.call(c, rsum$AMOUNT) x # ID YEAR AMOUNT rsum # 1 1 2010 5 NA # 2 1 2011 2 NA # 3 1 2012 4 11 # 4 1 2013 1 7 # 5 1 2014 3 8 # 6 2 2013 4 NA # 7 2 2014 6 NA # 8 2 2015 9 19 # 9 3 2012 4 NA # 10 3 2013 7 NA # 11 3 2014 2 13 # 12 3 2015 3 12 # remove one of the 2s x <- x[-6, ] rsum <- aggregate(AMOUNT ~ ID, data=x, FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE, FUN=function(y) if (length(y) >= 3) sum(y) else NA)) x$rsum <- do.call(c, rsum$AMOUNT) x # ID YEAR AMOUNT rsum # 1 1 2010 5 NA # 2 1 2011 2 NA # 3 1 2012 4 11 # 4 1 2013 1 7 # 5 1 2014 3 8 # 7 2 2014 6 NA # 8 2 2015 9 NA # 9 3 2012 4 NA # 10 3 2013 7 NA # 11 3 2014 2 13 # 12 3 2015 3 12

人吐槽 人点赞

Recommend

Comment

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:Finding Cumulative Sum In R Using Conditions