35125

How do I substract a value from a column in a dataframe, with unique rows

Question:

Hard to explain the question in a title. But here I have a data frame, and you can see I have 3 stream names. I have a 3 unique values associated with each stream name. I would like these unique values to be subtracted from the value for the corresponding stream found in the value column and then appended to the data frame in a new column titled error

stream n rates means column value 1 Brooks 3 3.0 0.9629152 1 0.42707006 2 Siouxon 3 3.0 0.5831929 1 0.90503736 3 Speelyai 3 3.0 0.6199235 1 0.08554021 4 Brooks 4 7.5 0.9722707 1 1.43338843 5 Siouxon 4 7.5 0.5865031 1 0.50574543 6 Speelyai 4 7.5 0.6118634 1 0.32252396 7 Brooks 5 10.0 0.9637475 1 0.88984211 8 Siouxon 5 10.0 0.5804420 1 0.47501800 9 Speelyai 5 10.0 0.5959238 1 0.15079491 10 Brooks 6 13.0 0.9486575 1 1.32422105 11 Siouxon 6 13.0 0.5846854 1 0.39479684 12 Speelyai 6 13.0 0.5597146 1 0.37005941

Here are the "unique" values I want subtracted from the value column

> true.lwd.sp <- 0.583984402 (speelyai) > true.lwd.sx <- 0.585852702 (souixon) > true.lwd.br <- 0.944062036 (brooks)

Thanks for any help. One day I may know how to do all these simple tasks!

Answer1:

We could create a new dataset and match the 'stream' column with the corresponding column in the new dataset, get the numeric index to get corresponding 'value' from 'df2', and subtract from the 'df1' or original dataset.

df1$error <- df1$value-df2$value[match(df1$stream, df2$stream)] df1 # stream n rates means column value error #1 Brooks 3 3.0 0.9629152 1 0.42707006 -0.51699198 #2 Siouxon 3 3.0 0.5831929 1 0.90503736 0.31918466 #3 Speelyai 3 3.0 0.6199235 1 0.08554021 -0.49844419 #4 Brooks 4 7.5 0.9722707 1 1.43338843 0.48932639 #5 Siouxon 4 7.5 0.5865031 1 0.50574543 -0.08010727 #6 Speelyai 4 7.5 0.6118634 1 0.32252396 -0.26146044 #7 Brooks 5 10.0 0.9637475 1 0.88984211 -0.05421993 #8 Siouxon 5 10.0 0.5804420 1 0.47501800 -0.11083470 #9 Speelyai 5 10.0 0.5959238 1 0.15079491 -0.43318949 #10 Brooks 6 13.0 0.9486575 1 1.32422105 0.38015901 #11 Siouxon 6 13.0 0.5846854 1 0.39479684 -0.19105586 #12 Speelyai 6 13.0 0.5597146 1 0.37005941 -0.21392499 <h3>data</h3> df1 <- structure(list(stream = c("Brooks", "Siouxon", "Speelyai", "Brooks", "Siouxon", "Speelyai", "Brooks", "Siouxon", "Speelyai", "Brooks", "Siouxon", "Speelyai"), n = c(3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), rates = c(3, 3, 3, 7.5, 7.5, 7.5, 10, 10, 10, 13, 13, 13), means = c(0.9629152, 0.5831929, 0.6199235, 0.9722707, 0.5865031, 0.6118634, 0.9637475, 0.580442, 0.5959238, 0.9486575, 0.5846854, 0.5597146), column = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(0.42707006, 0.90503736, 0.08554021, 1.43338843, 0.50574543, 0.32252396, 0.88984211, 0.475018, 0.15079491, 1.32422105, 0.39479684, 0.37005941)), .Names = c("stream", "n", "rates", "means", "column", "value"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")) df2 <- data.frame(stream=c('Brooks', 'Siouxon', 'Speelyai'), value=c(0.944062036, 0.585852702, 0.583984402), stringsAsFactors=FALSE)

Answer2:

Another option using data.table <em>using df2 from @akrun's answer</em>

library(data.table) setDT(df1)[stream == df2$stream, error := value - df2$value] # stream n rates means column value error #1: Brooks 3 3.0 0.9629152 1 0.42707006 -0.51699198 #2: Siouxon 3 3.0 0.5831929 1 0.90503736 0.31918466 #3: Speelyai 3 3.0 0.6199235 1 0.08554021 -0.49844419 #4: Brooks 4 7.5 0.9722707 1 1.43338843 0.48932639 #5: Siouxon 4 7.5 0.5865031 1 0.50574543 -0.08010727 #6: Speelyai 4 7.5 0.6118634 1 0.32252396 -0.26146044 #7: Brooks 5 10.0 0.9637475 1 0.88984211 -0.05421993 #8: Siouxon 5 10.0 0.5804420 1 0.47501800 -0.11083470 #9: Speelyai 5 10.0 0.5959238 1 0.15079491 -0.43318949 #10: Brooks 6 13.0 0.9486575 1 1.32422105 0.38015901 #11: Siouxon 6 13.0 0.5846854 1 0.39479684 -0.19105586 #12: Speelyai 6 13.0 0.5597146 1 0.37005941 -0.21392499

Answer3:

Akrun is right, but you need some extra code to join the datasets. Check this simple (but similar example):

library(dplyr) # your original dataset dt1 = data.frame(stream = c("A","B","C","A","B","C"), value = c(5,6,7,8,9,10)) # your dataset with the values for each case dt2 = data.frame(stream = c("A","B","C"), truevalue = c(0.58, 0.57, 0.56)) # join datasets and create the error variable result = dt1 %>% left_join(dt2, by="stream") %>% mutate(error = value - truevalue) result

The important thing is to make sure the names of the streams match in both datasets, so the join will be executed correctly.

Recommend