76526

dplyr how to lag by group

I have a data frame of orders and receivables with lead times. Can I use dplyr to fill in the receive column according to the groups lead time?

df <- data.frame(team = c("a","a","a","a", "a", "b", "b", "b", "b", "b"), order = c(2, 4, 3, 5, 6, 7, 8, 5, 4, 5), lead_time = c(3, 3, 3, 3, 3, 2, 2, 2, 2, 2)) >df team order lead_time a 2 3 a 4 3 a 3 3 a 5 3 a 6 3 b 7 2 b 8 2 b 5 2 b 4 2 b 5 2

And adding a receive column like so:

dfb <- data.frame(team = c("a","a","a","a", "a", "b", "b", "b", "b", "b"), order = c(2, 4, 3, 5, 6, 7, 8, 5, 4, 5), lead_time = c(3, 3, 3, 3, 3, 2, 2, 2, 2, 2), receive = c(0, 0, 0, 2, 4, 0, 0, 7, 8, 5)) >dfb team order lead_time receive a 2 3 0 a 4 3 0 a 3 3 0 a 5 3 2 a 6 3 4 b 7 2 0 b 8 2 0 b 5 2 7 b 4 2 8 b 5 2 5

I was thinking along these lines but run into an error

dfc <- df %>% group_by(team) %>% mutate(receive = if_else( row_number() < lead_time, 0, lag(order, n = lead_time))) Error in mutate_impl(.data, dots) : could not convert second argument to an integer. type=SYMSXP, length = 1

Thanks for the help!

Answer1:

This looks like a bug; There might be some unintended mask of the lag function between dplyr and stats package, try this work around:

df %>% group_by(team) %>% # explicitly specify the source of the lag function here mutate(receive = dplyr::lag(order, n=unique(lead_time), default=0)) #Source: local data frame [10 x 4] #Groups: team [2] # team order lead_time receive # <fctr> <dbl> <dbl> <dbl> #1 a 2 3 0 #2 a 4 3 0 #3 a 3 3 0 #4 a 5 3 2 #5 a 6 3 4 #6 b 7 2 0 #7 b 8 2 0 #8 b 5 2 7 #9 b 4 2 8 #10 b 5 2 5

Answer2:

We can also use shift from data.table

library(data.table) setDT(df)[, receive := shift(order, n = lead_time[1], fill=0), by = team] df # team order lead_time receive # 1: a 2 3 0 # 2: a 4 3 0 # 3: a 3 3 0 # 4: a 5 3 2 # 5: a 6 3 4 # 6: b 7 2 0 # 7: b 8 2 0 # 8: b 5 2 7 # 9: b 4 2 8 #10: b 5 2 5

Recommend

  • Matching data from one data frame to another
  • How to loop through the columns in an R data frame and create a new data frame using the column name
  • Sending same parameter twice in exec
  • sql 2005 join results
  • Select row data in addition to structured XML data
  • Dates and timespans with SQL
  • Sql Server 2008 Row to Column
  • Joining tables and LEFT JOIN as new Columns in tsql
  • Trouble with MySQL query using AVG()
  • Teradata SQL how to transfer “by date” to by “date range”?
  • Translate MySQL join into SQL Server syntax?
  • selecting top column1 with matching column2
  • Moving Average / Rolling Average
  • Excluding only one MIN value on Oracle SQL
  • get row number of record in resultset sql server
  • New dataframe column as function (digest) of another one is not working for me
  • Which is faster a select sub-query or a left outer join in a paginated result set
  • Lead() and LAG() functionality in SQL Server 2008
  • SQL Keeping count of occurrences
  • How select second line from top 2 or something similar?
  • Threading lock in python not working as desired
  • Row_number() function for Informix
  • Does cast away const of *this cause undefined behavior?
  • Is there a way to pivot a customer ID and a their most recent order dates?
  • CRASH: *** -[__NSArrayM objectAtIndex:]: index 4294967295 beyond bounds [0 .. 9]
  • Merging rows to columns
  • How do I include a SWC in an AS2 Flash project?
  • How to add a focus style to an editable ComboBox in WPF
  • uniform generation of points on 3D box
  • How do I superscript characters in a UIButton?
  • SharedPreferences or SQLite Database?
  • Breeze - Deleted Items nav properties bug
  • Q promise. Difference between .when and .then
  • Is my CUDA kernel really runs on device or is being mistekenly executed by host in emulation?
  • javaw.exe and eclipse startup problems
  • Rails 2: use form_for to build a form covering multiple objects of the same class
  • need help with bizarre java.net.HttpURLConnection behavior
  • How to Embed XSL into XML
  • UserPrincipal.Current returns apppool on IIS
  • Conditional In-Line CSS for IE and Others?