40642

Create new data frame with multiple subsets of same variable

Question:

I'd like to create a new data frame where the columns are subsets of the same variable that are split by a different variable. For example, I'd like to make a new subset of variable ('b') where the columns are split by a subset of a different variable ('year')

set.seed(88) df <- data.frame(year = rep(1996:1998,3), a = runif(9), b = runif(9), e = runif(9)) df year a b e 1 1996 0.41050128 0.97679183 0.7477684 2 1997 0.10273570 0.54925568 0.7627982 3 1998 0.74104481 0.74416429 0.2114261 4 1996 0.48007870 0.55296210 0.7377032 5 1997 0.99051343 0.18097104 0.8404930 6 1998 0.99954223 0.02063662 0.9153588 7 1996 0.03247379 0.33055434 0.9182541 8 1997 0.76020784 0.10246882 0.7055694 9 1998 0.67713100 0.59292207 0.4093590

Desired output for variable 'b' for years 1996 and 1998, is:

V1 V2 1 0.9767918 0.74416429 2 0.5529621 0.02063662 3 0.3305543 0.59292207

I could probably find a way to do this with a loop, but am wondering if there is a dplyr methed (or any simple method to accomplish this).

Answer1:

We subset dataset based on 1996, 1998 in 'year', select the 'year', 'b' columns and unstack to get the expected output

unstack(subset(df, year %in% c(1996, 1998), select = c('year', 'b')), b ~ year) # X1996 X1998 #1 0.9767918 0.74416429 #2 0.5529621 0.02063662 #@3 0.3305543 0.59292207 <hr />

Or using tidyverse, we select the columns of interest, filter the rows based on the 'year' column, create a sequence column by 'year', spread to 'wide' format and select out the unwanted columns

library(tidyverse) df %>% select(year, b) %>% filter(year %in% c(1996, 1998)) %>% group_by(year = factor(year, levels = unique(year), labels = c('V1', 'V2'))) %>% mutate(n = row_number()) %>% spread(year, b) %>% select(-n) # A tibble: 3 x 2 # V1 V2 # <dbl> <dbl> #1 0.977 0.744 #2 0.553 0.0206 #3 0.331 0.593 <hr />

As there are only two 'year's, we can also use summarise

df %>% summarise(V1 = list(b[year == 1996]), V2 = list(b[year == 1998])) %>% unnest

Answer2:

Another option with dplyr, mixing in some base R, resulting in a <em>tiny bit</em> shorter solution than @akrun's code:

bind_cols(split(df$b, df$year)) %>% select(-'1997') # A tibble: 3 x 2 `1996` `1998` <dbl> <dbl> 1 0.977 0.744 2 0.553 0.0206 3 0.331 0.593

Recommend