My problem is:
I have a large number of numeric variables for which I need to generate summary statistics. Some of the observations are coded "-99", which means the participant does not know the answer to the survey question.
While calculating means for such variables, I want to exclude the "-99" observations. Since I have a lot of variables, it would be quite onerous to use "subset".
Does anyone know an easier way?
PS: I know that for factors, the >- Summarize(df, exclude ="") command in the FSA package could work. I am just not sure if there is an equivalent for numeric variables.Answer1:
Just make yourself a simple wrapper function around
set.seed(1) x <- rnorm(100) x[sample(seq_along(x), 10)] <- -99 summary2 <- function(x) summary(x[x!=-99])
> summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. -99.00000 -0.70810 -0.04209 -9.79400 0.59810 2.40200 > summary2(x) Min. 1st Qu. Median Mean 3rd Qu. Max. -2.21500 -0.52640 0.07445 0.11770 0.67230 2.40200