9043

Look up from different dataframes depending on a column

Question:

Supposing I have the following dataframes:

d1 <- data.frame(index = c(1,2,3,4), location = c('barn', 'house', 'restaurant', 'tomb'), random = c(5,3,2,1), different_col1 = c(66,33,22,11)) d2 <- data.frame(index = c(1,2,3,4), location = c('server', 'computer', 'home', 'dictionary'), random = c(1,7,2,9), differen_col2 = c('hi', 'there', 'different', 'column'))

What I am trying to do is get the location based on the index and what dataframe it is. So I have the following:

data <- data.frame(src = c('one', 'one', 'two', 'one', 'two'), index = c(1,4,2,3,2))

Where src indicates which dataframe the data should come from and index, the value in index from the index column.

src | index ------------- one | 1 one | 4 two | 2 one | 3 two | 2

And I would like it to become:

src | index | location ----------------------- one | 1 | barn one | 4 | tomb two | 2 | computer one | 3 | restaurant two | 2 | computer

Due to the size of my data I would like to avoid merge or comparable joins (sqldf, etc).

Answer1:

Here's one way to add a new column <em>by reference</em> using data.table:

require(data.table) setDT(d1); setDT(d2); setDT(data) # convert all data.frames to data.tables data[src == "one", location := d1[.SD, location, on="index"]] data[src == "two", location := d2[.SD, location, on="index"]]

.SD stands for <em>subset of data</em>, and contains all columns in data that matches the condition provided in i-argument.

See the <a href="https://github.com/Rdatatable/data.table/wiki/Getting-started" rel="nofollow">vignettes</a> for more.

You can use match in the expression to the right of := as well instead of extracting location using a join. But it'd not be extensible if you'd want to match on multiple columns.

Answer2:

library(dplyr) mutate(data, location = ifelse(src == "one", as.character(d1[index, "location"]), as.character(d2[index, "location"])))

output

src index location 1 one 1 barn 2 one 4 tomb 3 two 2 computer 4 one 3 restaurant 5 two 2 computer

Answer3:

data.table will help you to deal with Big Data much more efficiently.

You could either use match or a special data.table implementation of merge that's much faster than the merge of my original solution, as we discussed in the comments.

Here's an example:

require(data.table) d1 <- data.frame(index = c(1,2,3,4), location = c('barn', 'house', 'restaurant', 'tomb'), random = c(5,3,2,1), different_col1 = c(66,33,22,11)) d2 <- data.frame(index = c(1,2,3,4), location = c('server', 'computer', 'home', 'dictionary'), random = c(1,7,2,9), differen_col2 = c('hi', 'there', 'different', 'column')) mydata <- data.table(src = c('one', 'one', 'two', 'one', 'two'), index = c(1,4,2,3,2)) mydata.d1 <- mydata[mydata$src == "one",] mydata.d2 <- mydata[mydata$src == "two",] mydata.d1 <- merge(mydata.d1, d1, all.x = T, by = "index") mydata.d2 <- merge(mydata.d2, d2, all.x = T, by = "index") # If you want to keep the 'different column' values from d1 and d2: mydata <- rbind(mydata.d1, mydata.d2, fill = T) mydata index src location random different_col1 differen_col2 1: 1 one barn 5 66 NA 2: 3 one restaurant 2 22 NA 3: 4 one tomb 1 11 NA 4: 2 two computer 7 NA there 5: 2 two computer 7 NA there # If you don't want to keep those 'different column' values: mydata <- rbind(mydata.d1[,.(index, src, location)], mydata.d2[,.(index, src, location)]) mydata index src location 1: 1 one barn 2: 3 one restaurant 3: 4 one tomb 4: 2 two computer 5: 2 two computer

Answer4:

Base solution: use a character index to chose the correct dataframe and then use mapply to handle submission of the multiple "parallel arguments.

dput(dat) structure(list(src = c("one", "one", "two", "one", "two"), X. = c("|", "|", "|", "|", "|"), index = c(1L, 4L, 2L, 3L, 2L), location = structure(c(1L, 4L, 5L, 3L, 5L), .Label = c("barn", "house", "restaurant", "tomb", "computer", "dictionary", "home", "server"), class = "factor")), .Names = c("src", "X.", "index", "location"), row.names = c(NA, -5L), class = "data.frame")

May need to use stringsAsFactor to ensure character argument.

dat$location <- mapply(function(whichd,i) dlist[[whichd]][i,'location'], whichd=dat$src, i=dat$index) > dat src X. index location 1 one | 1 barn 2 one | 4 tomb 3 two | 2 computer 4 one | 3 restaurant 5 two | 2 computer >

Recommend

  • Resharpening from the command line
  • Fetch data from nested nodes in Firebase
  • Shell script to execute nohup against an inputed filename
  • How to send the client id and secret id of OAuth2 using Angular 2?
  • Java: How to refer to subclass's static variable in abstract class?
  • Need reference code for SMO in C# SQL Server 2008
  • ASP.Net MVC entity framework submit model, then open new model in edit page
  • Using Facebook Graph API with ASP.NET
  • Find string between two substrings AND between string and the end of file
  • Redmine can't generate secret token
  • LibGDX: How to Implement Google Play Game Services? [closed]
  • Is it possible to add a hyperlink to a UIAlertController?
  • Get an image from the video
  • Limit number of button clicks
  • How can I access the Google account user_id?
  • How can I detect mongodb reconnect failed event
  • Send HTML Mail with Unicode
  • How to copy memory
  • insert into mysql database, if records already exists, then update [duplicate]
  • SqlDatasource select parameters
  • Running iPhone crash Logs from testers on XCode
  • Expression.Call GroupBy then Select and Count()?
  • Use awk to convert GPS Position to Latitude & Longitude
  • Filtering out choiceless polls in the Django tutorial causes polls in the index to duplicate
  • How to move to lines with the same indentation in Visual Studio Code
  • Tkinter tkMessageBox disables Tkinter key bindings
  • dmtracedump doesn't work, HELP!
  • Adding native code to an existing Worklight hybrid app
  • separate tokens in batch file
  • LINQ to populate treeview based upon grouping
  • How to check if a database and tables exist in sql server in a vb .net project?
  • Rotating Towards Path in OpenGL
  • Typeahead.js does give me suggestions but doesn't select them
  • Is there a better way for handling SpatialPolygons that cross the antimeridian (date line)?
  • How to change user identity when git pushing via ssh?
  • Angular 4: Responsive Grid List
  • Time Complexity of Fibonacci Algorithm [duplicate]