selection of observations by combining criteria in R


This topic has probably been brought up and it is a quite simpe solution , i guess. However i couldnt make it up to now. Lets say i have a data.frame (called "data") which contains 10 individuals (id) on which i collected observations at 3 time points (T)

> data <- data.frame(id = rep(c(1:10), 3), T = gl(3, 10), X = sample(1:30), Y = sample(c("yes", "no"), 30, replace = TRUE), Z = sample(1:40, 30), Z2 = rnorm(30, mean = 5, sd = 0.5)) > head(data) id T X Y Z Z2 1 1 1 10 yes 15 5.993605 2 2 1 18 no 22 6.096566 3 3 1 5 no 24 5.101393 4 4 1 15 yes 18 4.944108 5 5 1 23 no 34 4.634176 6 6 1 13 no 27 5.576015

I would like to create a subset of this data.frame (an new data.frame called data2) by selecting only individuals that have "yes" (variable Y) for each of the three time points (variable T), that means Y="yes" for T=1 and T=2 and T=3.

I know that combining conditions can be achieved by using the "&" sign, and this can be used to relate conditions for the 3 time points. However, my problem is to write each condition for each time point : how to tell R that i want subjects for which Y="yes" at T="1" for example ?

Thank you very much in advance to all. Have a great day,



You can do:

keep.ids <- tapply(data$Y, data$id, FUN = function(x)all(x == "yes")) subset(data, keep.ids[factor(id)])

Or use the plyr package:

library(plyr) ddply(data, "id", function(x) if(all(x$Y == "yes")) x else NULL)


