46953

Finding Cumulative Sum In R Using Conditions

I need to create a new variable with the sum of the past three years' amounts for each ID.

If there are not three years' worth of data, there should be an 'NA'.

As an example:

ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 2 2014 6 2 2015 9 3 2012 4 3 2013 7 3 2014 2 3 2015 3

Here's what the result should be:

ID YEAR AMOUNT THREE_YR 1 2010 5 NA 1 2011 2 NA 1 2012 4 11 1 2013 1 7 1 2014 3 8 2 2013 4 NA 2 2014 6 NA 2 2015 9 19 3 2012 4 NA 3 2013 7 NA 3 2014 2 13 3 2015 3 12

How would I do this? Thanks!

Answer1:

We can use functions from dplyr and zoo. dt2 is the final output.

# Create example data frame dt <- read.table(text = "ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 2 2014 6 2 2015 9 3 2012 4 3 2013 7 3 2014 2 3 2015 3", header = TRUE, stringsAsFactors = FALSE) # Load packages library(dplyr) library(zoo) # Process the data dt2 <- dt %>% group_by(ID) %>% mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right"))

Update: ID groups with less than 3 records.

The OP asked what to do if there are IDs with only one or two rows. Honestly, I did not find a good way to solve this. The only thing I can think of is dividing the original data frame to two groups, apply the rollsum to the group with all records larger than or equal to three. After that, combine all groups.

# Create example data frame dt <- read.table(text = "ID YEAR AMOUNT 1 2010 5 1 2011 2 1 2012 4 1 2013 1 1 2014 3 2 2013 4 3 2012 4 3 2013 7 3 2014 2 3 2015 3", header = TRUE, stringsAsFactors = FALSE) # Load packages library(dplyr) library(zoo) # Process the data dt2 <- dt %>% group_by(ID) %>% filter(n() >= 3) %>% mutate(THREE_YR = rollsum(AMOUNT, k = 3, fill = NA, align = "right")) %>% bind_rows(dt %>% group_by(ID) %>% filter(n() < 3)) %>% arrange(ID, YEAR)

Answer2:

With the data.table:

library(data.table) setDT(dt) setorder(dt,YEAR) dt[,.(YEAR,AMOUNT,THREE_YR=AMOUNT+shift(AMOUNT,1)+shift(AMOUNT,2)),by=.(ID)] #ID YEAR AMOUNT THREE_YR # 1: 1 2010 5 NA # 2: 1 2011 2 NA # 3: 1 2012 4 11 # 4: 1 2013 1 7 # 5: 1 2014 3 8 # 6: 3 2012 4 NA # 7: 3 2013 7 NA # 8: 3 2014 2 13 # 9: 3 2015 3 12 #10: 2 2013 4 NA #11: 2 2014 6 NA #12: 2 2015 9 19

Answer3:

Using zoo::rollapplyr() and aggregate() This will return NA if there are less than three members in a group.

x <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), YEAR = c(2010L, 2011L, 2012L, 2013L, 2014L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L, 2015L), AMOUNT = c(5L, 2L, 4L, 1L, 3L, 4L, 6L, 9L, 4L, 7L, 2L, 3L)), .Names = c("ID", "YEAR", "AMOUNT" ), class = "data.frame", row.names = c(NA, -12L)) library(zoo) rsum <- aggregate(AMOUNT ~ ID, data=x, FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE, FUN=function(y) if (length(y) >= 3) sum(y) else NA)) x$rsum <- do.call(c, rsum$AMOUNT) x # ID YEAR AMOUNT rsum # 1 1 2010 5 NA # 2 1 2011 2 NA # 3 1 2012 4 11 # 4 1 2013 1 7 # 5 1 2014 3 8 # 6 2 2013 4 NA # 7 2 2014 6 NA # 8 2 2015 9 19 # 9 3 2012 4 NA # 10 3 2013 7 NA # 11 3 2014 2 13 # 12 3 2015 3 12 # remove one of the 2s x <- x[-6, ] rsum <- aggregate(AMOUNT ~ ID, data=x, FUN=function(x) rollapplyr(x, 3, fill=NA, partial=TRUE, FUN=function(y) if (length(y) >= 3) sum(y) else NA)) x$rsum <- do.call(c, rsum$AMOUNT) x # ID YEAR AMOUNT rsum # 1 1 2010 5 NA # 2 1 2011 2 NA # 3 1 2012 4 11 # 4 1 2013 1 7 # 5 1 2014 3 8 # 7 2 2014 6 NA # 8 2 2015 9 NA # 9 3 2012 4 NA # 10 3 2013 7 NA # 11 3 2014 2 13 # 12 3 2015 3 12

Recommend

  • Get start and end index of runs of values
  • R update values based on event
  • Is it possible to add tickboxes in a DT datatable?
  • How to subset a data.frame if the column contains NAs [duplicate]
  • Looping and adding to a counter in R
  • saving file generated by TCPDF
  • How to view images from protected folder with php?
  • Textfile Structure (tables)
  • Nant, Vault & Windows Integrated Authentication
  • Javascript Callbacks with Object constructor
  • MySQL WHERE-condition in procedure ignored
  • Deserializing XML into class C#
  • vba code to select only visible cells in specific column except heading
  • Adding custom controls to a full screen movie
  • Web-crawler for facebook in python
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • Delete MySQLi record without showing the id in the URL
  • Return words with double consecutive letters
  • Unanticipated behavior
  • Comma separated Values
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • SQL merge duplicate rows and join values that are different
  • trying to dynamically update Highchart column chart but series undefined
  • Error creating VM instance in Google Compute Engine
  • Hits per day in Google Big Query
  • Trying to get generic when generic is not available
  • how does django model after text[] in postgresql [duplicate]
  • How to stop GridView from loading again when I press back button?
  • LevelDB C iterator
  • Bitwise OR returns boolean when one of operands is nil
  • python draw pie shapes with colour filled
  • Can't mass-assign protected attributes when import data from csv file
  • sending mail using smtp is too slow
  • Busy indicator not showing up in wpf window [duplicate]
  • costura.fody for a dll that references another dll
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • Binding checkboxes to object values in AngularJs
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • java string with new operator and a literal