75556

Row wise operations in R

Question:

I have a dataframe with following layout:

id |diff ---- 1 | 0 1 | 3 1 | 45 1 | 9 1 | 40 1 | 34 1 | 43 1 | 7 2 | 0 2 | 5 3 | 0 3 | 45 3 | 40

I need to add a counter in a such a way that :

<ol><li>when the id changes the counter should reset to 1</li> <li>when the id is same and the diff is less than 10 the counter shall give the preceding counter value.</li> <li>when the id is same and the diff is greater than 10 the counter shall be incremented by +1.</li> </ol>

The output I am looking for is :

id |diff | counter ------------- 1 | 0 | 1 1 | 3 | 1 1 | 45 | 2 1 | 9 | 2 1 | 40 | 3 1 | 34 | 4 1 | 43 | 5 1 | 7 | 5 2 | 0 | 1 2 | 5 | 1 3 | 0 | 1 3 | 45 | 2 3 | 40 | 3

The for loop solution is :

for(i in 2:nrow(raw_data)){ raw_data$counter[i]<- ifelse(raw_data$id[i]==raw_data$id[i-1], ifelse(raw_data$diff> 10,raw_data$counter[i-1] +1,raw_data$counter[i-1]) ,1)}

I am aware of the increase in time due to 'for' loop. Looking for a faster way.

Answer1:

As the OP is <em>looking for a faster way</em>, here is a benchmark comparison of <a href="https://stackoverflow.com/a/44369684/3817004" rel="nofollow">P Lapointe's dplyr solution</a> and a data.table version.

The data.table version is a re-write of <a href="https://stackoverflow.com/a/44369684/3817004" rel="nofollow">P Lapointe's approach</a> in data.table syntax:

library(data.table) # CRAN version 1.10.4 used DT <- fread( "id |diff 1 | 0 1 | 3 1 | 45 1 | 9 1 | 40 1 | 34 1 | 43 1 | 7 2 | 0 2 | 5 3 | 0 3 | 45 3 | 40" , sep = "|") DT[, counter := cumsum(diff > 10L) + 1L, id] DT # id diff counter # 1: 1 0 1 # 2: 1 3 1 # 3: 1 45 2 # 4: 1 9 2 # 5: 1 40 3 # 6: 1 34 4 # 7: 1 43 5 # 8: 1 7 5 # 9: 2 0 1 #10: 2 5 1 #11: 3 0 1 #12: 3 45 2 #13: 3 40 3 <h3>Benchmark</h3>

For benchmarking, a larger data set of 130'000 rows is created:

# copy original data set 10000 times DTlarge <- rbindlist(lapply(seq_len(10000L), function(x) DT)) # make id column unique again DTlarge[, id := rleid(id)] dim(DTlarge) #[1] 130000 2

Timing is done by the mircobenchmark package:

df1 <- as.data.frame(DTlarge) dt1 <- copy(DTlarge) library(dplyr) microbenchmark::microbenchmark( dplyr = { df1%>% group_by(id)%>% mutate(counter=cumsum(diff>10)+1) }, dt = { dt1[, counter := cumsum(diff > 10L) + 1L, id] }, times = 10L )

The results show that the data.table version is about 20 times faster for this problem size:

Unit: milliseconds expr min lq mean median uq max neval dplyr 500.51729 505.50173 512.25642 509.64096 517.31095 535.2736 10 dt 23.06037 23.99073 25.30913 24.71059 25.98322 30.7868 10

Answer2:

Hers's how to do that with dplyr:

df1 <- read.table(text="id diff 1 0 1 3 1 45 1 9 1 40 1 34 1 43 1 7 2 0 2 5 3 0 3 45 3 40",header=TRUE, stringsAsFactors=FALSE) library(dplyr) df1%>% group_by(id)%>% mutate(counter=cumsum(diff>10)+1) id diff counter <int> <int> <dbl> 1 1 0 1 2 1 3 1 3 1 45 2 4 1 9 2 5 1 40 3 6 1 34 4 7 1 43 5 8 1 7 5 9 2 0 1 10 2 5 1 11 3 0 1 12 3 45 2 13 3 40 3

Recommend

  • How to obtain multiple windows containing multiple graphs using matplotlib?
  • Compile error with decltype of iterator de-reference
  • R with roxygen2: How to use a single function from another package?
  • PHP + MYSQL on Duplicate KEY still increase the INDEX KEY
  • How to get the real file size of a file in a multipart/form-data request
  • Why does reduce not print the first value?
  • How to find “Docker Host URI” to be used in Jenkins “Docker Plugin”?
  • module schedule/tracking does not exist in the Haste module map
  • Nice way to select a tuple using JPA
  • How to pass object in nested functions?
  • Clojure can't import JavaFX classes with static initializers
  • UIRefreshControl tint color doesn't match given color
  • Office 365 Graph API $top not being honoured on delta queries
  • Uncaught Error: Assertion Failed: You may not set `id` as an attribute on your model
  • How can I tear down a SparkSession and create a new one within one application?
  • Reading emails with imaplib - “Got more than 10000 bytes” error
  • send a file using NSURLConnection
  • What are the arguments against using a CSS Framework?
  • Bluetooth Low Energy device scanning Failed with an exception
  • HTML/CSS text around image
  • Create a double instance of a service in my angular app
  • Password_verify in PHP
  • Returning large collections from WCF Serivce
  • Adding a delete button in PHP on each row of a MySQL table
  • JQuery and PHP validation problem?
  • How do you place a variable inside a template tag's argument?
  • Highlight special word in a TextBox
  • Enable CORS on Tomcat 8.0.30
  • javax.net.ssl.SSLException: SSL handshake aborted Connection reset by peer while calling webservice
  • iOS App crash issue `[UIWindow warpPoint:]`
  • How to get File path from pdfUri obtained from PDF chooser intent library, in onActivityResult call
  • Is possible having two COM STA instances of the same component?
  • Checking for valid enum types from protobufs
  • playing mp3 from nsbundle
  • Stop an element moving with padding on hover
  • How to create subsets of a single set of elements with XSLT?
  • LinkedIn API: Access Denied when getting Access Token
  • `$http:badreq Bad Request Configuration` - from angular post method, what is wrong here?
  • Bad automatic Triangulation with Mayavi for coloring a surface known only by its corner
  • Angular FormGroup won't update it's value immediately after patchValue or setValue