81350

Check if column 1 value appeared previously in dataset with a different column 2 value

Question:

I have my dataset such that

df <- data.frame(ID = c("m1","m2","m3","m4","m5","m6","m2","m3","m5","m6","m1","m4","m5"), Year = c(1,1,1,1,1,1,2,2,2,2,3,3,3))

and want to perform a check whether the ID appears in the previous year. Now I have a code that seems to work

df$Check <- apply(df, 1, function(x) x["ID"] %in% df[df$Year == (as.numeric(x["Year"]) - 1), "ID"])

but given that my dataset is 3million rows long this function takes far too long to run. Is there a better alternative to this??

Answer1:

Try

library(dplyr) dfs <- split(df$ID, df$Year); df$check <- unlist(mapply(`%in%`, dfs, lag(dfs)))

Answer2:

You may use ave: for each ID, calculate the difference between current Year and preceeding Year (diff). Pad with a leading zero. Check if the result is 1 to create a logical vector:

df$check2 <- with(df, ave(Year, ID, FUN = function(x) c(0, diff(x))) == 1) # ID Year check check2 # 1 m1 1 FALSE FALSE # 2 m2 1 FALSE FALSE # 3 m3 1 FALSE FALSE # 4 m4 1 FALSE FALSE # 5 m5 1 FALSE FALSE # 6 m6 1 FALSE FALSE # 7 m2 2 TRUE TRUE # 8 m3 2 TRUE TRUE # 9 m5 2 TRUE TRUE # 10 m6 2 TRUE TRUE # 11 m1 3 FALSE FALSE # 12 m4 3 FALSE FALSE # 13 m5 3 TRUE TRUE

Similar with data.table:

For each ID (by = ID), create the new variable check2: check if the difference between current Year and preceeding Year in the data is 1 ((diff(year) == 1), i.e. if the preceeding year is the <em>previous</em> year.

library(data.table) setDT(df)[ , Check2 := c(FALSE, diff(Year) == 1), by = ID] <hr />

Edit following comment by OP. In case of "<em>multiple entries of the same ID in the same year</em>", you perform the calulation on data where duplicated rows are removed (unique). Then join the result to the original data.

df2 <- unique(df) df2[ , Check2 := c(FALSE, diff(Year) == 1), by = ID] df[df2, on = c("ID", "Year")]

Answer3:

k = length(unique(df$Year)) # how many years in the data q = unique(df$Year) # which are the years present func <- function(x){ kk = df$ID[df$Year == q[x]] # get the current year's ID which are present kk %in% df$ID[df$Year == q[x-1]] # compare that to the previous year's ID } x <- sum(df$Year==unique(df$Year)[1]) #to know how many FALSE to be added initially df$check <- c(rep(FALSE, x),unlist(lapply(2:k, func)))

Recommend

  • How to create a new variable with values from different variables if another variable equals a set v
  • Handle a table with ID repetition
  • Creating a frequency table in R
  • Numpy: 2D array access with 2D array of indices
  • How to find tail rows of a data frame that satisfy set criteria?
  • How to get rows with min values in one column, grouped by other column, while keeping other columns?
  • Python: how to split and return a list from a function to avoid memory error
  • Plot a table with R
  • What distributed message queues support millions of queues?
  • addressing in assembler
  • How to get the index of element in the List in c#
  • command line of process by name
  • Neo4j: Filter nodes based on aggregate function
  • What is corresponding c++ data type to SQL numeric(18,0) data type?
  • jQuery - resize an elements height to match window without refreshing, on window resize
  • Primefaces lazy datascroller calling load twice
  • Azure webjobs output logs indexing taking very long
  • Reduction and collapse clauses in OMP have some confusing points
  • ADO and msqli connections very slow
  • Marklogic : Query response time is very high
  • MongoDb aggregation
  • Content-Length header not returned from Pylons response
  • Play WS (2.2.1): post/put large request
  • How to use remove-erase idiom for removing empty vectors in a vector?
  • How to access EntityManager inside Entity class in EJB3
  • Retrieving value from sql ExecuteScalar()
  • vba code to select only visible cells in specific column except heading
  • Do I've to free mysql result after storing it?
  • SVN: Merging two branches together
  • Hibernate gives error error as “Access to DialectResolutionInfo cannot be null when 'hibernate.
  • Transpose CSV data with awk (pivot transformation)
  • WPF Applying a trigger on binding failure
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • Linking SubReports Without LinkChild/LinkMaster
  • Sorting a 2D array using the second column C++
  • Binding checkboxes to object values in AngularJs
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How to load view controller without button in storyboard?