24769

Loop over unique values R

Question:

I previously posted <a href="https://stackoverflow.com/questions/24703724/loop-overwriting" rel="nofollow">a loop question</a> and am trying another loop with no success. Help with trying to figure this out would be greatly appreciated. As of now, to get my work done I'm going to subset the data by year and run my original function as is, but one of the datasets I'm working with is a long time series. My original function calculates the number of fish at age for a given year dataset. This function works fine. What I would like to do is add a for loop that will allow the function to loop for all years and give the same information.

Data:

x <- structure(list(Year = c(2007, 2012, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012, 2012, 2012, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012 ), Season = c("Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall"), Length = c(6, 9, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33, 34, 34, 35, 35, 36, 37, 37, 38, 38, 39, 40, 40, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, 47, 47, 48, 49, 50, 51, 51, 52, 52, 53, 54, 55, 58), Exp_number = c(2, 1, 3, 2, 2, 6, 4, 11, 6, 24, 13, 38, 41.208, 26, 77.096, 37, 227.704, 41, 276.064, 20, 276.536, 23, 277.008, 23, 72.832, 11, 66.096, 8, 43.888, 12, 13.472, 14, 2, 14, 4, 8, 4, 10, 5, 12, 2, 13, 5, 9, 2, 7, 1, 4, 3, 2, 2, 8, 2, 3, 2, 1, 3, 2, 5, 1, 8, 2, 2, 2, 1, 6, 1, 2, 1, 1, 4, 1, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1)), .Names = c("Year", "Season", "Length", "Exp_number"), row.names = c(8L, 43L, 55L, 64L, 68L, 75L, 78L, 86L, 91L, 98L, 103L, 110L, 115L, 120L, 125L, 131L, 136L, 143L, 148L, 157L, 162L, 169L, 174L, 181L, 186L, 193L, 197L, 206L, 211L, 220L, 225L, 234L, 238L, 247L, 252L, 260L, 265L, 274L, 279L, 288L, 293L, 302L, 307L, 316L, 320L, 329L, 334L, 343L, 346L, 355L, 360L, 368L, 371L, 383L, 392L, 395L, 404L, 409L, 422L, 430L, 435L, 447L, 456L, 461L, 468L, 472L, 480L, 483L, 491L, 495L, 505L, 512L, 516L, 527L, 537L, 545L, 550L, 553L, 558L, 562L, 565L, 568L, 571L, 583L), class = "data.frame") y <- structure(c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 4, 5, 30, 29, 28, 17, 8, 8, 6, 16, 26, 59, 46, 77, 89, 78, 64, 51, 34, 31, 27, 19, 30, 21, 26, 13, 26, 18, 12, 8, 9, 12, 9, 6, 13, 12, 20, 10, 14, 14, 11, 8, 10, 13, 7, 6, 4, 4, 8, 2, 2, 4, 3, 0, 2, 0, 1, 1, 1, 1, 1, 1, 0.941176470588235, 0.875, 0.625, 0.666666666666667, 0.375, 0.423076923076923, 0.423728813559322, 0.391304347826087, 0.246753246753247, 0.235955056179775, 0.153846153846154, 0.203125, 0.0980392156862745, 0.0882352941176471, 0, 0.037037037037037, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0588235294117647, 0.125, 0.375, 0.333333333333333, 0.5625, 0.5, 0.457627118644068, 0.478260869565217, 0.545454545454545, 0.561797752808989, 0.564102564102564, 0.59375, 0.647058823529412, 0.411764705882353, 0.483870967741935, 0.222222222222222, 0.157894736842105, 0.0666666666666667, 0.0476190476190476, 0.0384615384615385, 0.0769230769230769, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0625, 0.0769230769230769, 0.11864406779661, 0.130434782608696, 0.207792207792208, 0.191011235955056, 0.269230769230769, 0.171875, 0.176470588235294, 0.5, 0.483870967741935, 0.481481481481481, 0.736842105263158, 0.8, 0.619047619047619, 0.576923076923077, 0.615384615384615, 0.423076923076923, 0.277777777777778, 0.333333333333333, 0.25, 0.111111111111111, 0.166666666666667, 0.111111111111111, 0, 0, 0, 0, 0, 0.142857142857143, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0112359550561798, 0.0128205128205128, 0.03125, 0.0784313725490196, 0, 0.032258064516129, 0.185185185185185, 0.105263157894737, 0.1, 0.285714285714286, 0.269230769230769, 0.307692307692308, 0.5, 0.5, 0.583333333333333, 0.625, 0.555555555555556, 0.666666666666667, 0.444444444444444, 0.5, 0.538461538461538, 0.333333333333333, 0.25, 0.1, 0, 0.214285714285714, 0, 0.125, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.037037037037037, 0, 0.0333333333333333, 0.0476190476190476, 0.115384615384615, 0, 0.0769230769230769, 0.166666666666667, 0.0833333333333333, 0.125, 0.222222222222222, 0.166666666666667, 0.444444444444444, 0.333333333333333, 0.230769230769231, 0.333333333333333, 0.4, 0.9, 0.571428571428571, 0.357142857142857, 0.545454545454545, 0.5, 0.4, 0.230769230769231, 0.142857142857143, 0.333333333333333, 0.25, 0, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.037037037037037, 0, 0, 0, 0, 0, 0, 0.0555555555555556, 0, 0, 0.111111111111111, 0, 0, 0.166666666666667, 0.230769230769231, 0.25, 0.25, 0, 0.214285714285714, 0.214285714285714, 0.272727272727273, 0, 0.4, 0.461538461538462, 0.285714285714286, 0, 0.75, 0.5, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0833333333333333, 0.1, 0, 0.0714285714285714, 0.142857142857143, 0.0909090909090909, 0.125, 0.1, 0.307692307692308, 0.285714285714286, 0.333333333333333, 0, 0.25, 0.25, 0, 1, 0.5, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0.0909090909090909, 0, 0, 0, 0.285714285714286, 0.333333333333333, 0, 0, 0, 0, 0, 0.25, 0.333333333333333, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.5, 0, 0.25, 0.666666666666667, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0.5, 0, 1), .Dim = c(57L, 13L), .Dimnames = list(NULL, c("len", "nl", "A0", "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10"))) z <- structure(list(Length = c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)), .Names = "Length", row.names = c(NA, -57L), class = "data.frame")

Original function - subset x for one year to run.

subx <- subset(x,Year==2007) myfunction <- function (x,y,z) { #x - expanded length data by year and season #y - alk as a percent by season #z - length bins from alk percent data #merge z and y to make sure both data sets have the same number of length bins comb<-merge(z,x,by="Length",all=T) #replace NA by column comb$Year[is.na(comb$Year)]<-unique(x$Year) comb$Exp_number[is.na(comb$Exp_number)] <- 0 #catch at age catchatage<-comb[,4]*y[,3:13] columntotal<-colSums(catchatage[1:dim(catchatage)[1],1:dim(catchatage)[2]]) #assign and combine age, number of fish at age and year Number=as.data.frame(columntotal) Age=rownames(Number);Age<-as.numeric(gsub("A","",Age)) #get rid of the letter A in age Year=rep(unique(x$Year),length(columntotal)) age<-as.data.frame(cbind(Year,Age,Number=Number[,1])) #reorder age age<-age[order(age[,2]),] return(age) } myfunction(subx,y,z)

My function with a loop for unique year values - use entire x dataset.

myfunction_2 <- function (x,y,z) { #x - expanded length data by year and season #y - alk as a percent by season #z - length bins from alk percent data #loop through years in survey #get unique year values from combined year season dataset y_levels<-unique(x$Year) for (i in length(y_levels)){ #subset the data subset_data<-x$Year==y_levels[i] #merge z and y to make sure both data sets have the same number of length bins comb<-merge(z,subset_data,by="Length",all=T) #replace NA by column comb$Year[is.na(comb$Year)]<-unique(x$Year[i]) comb$Exp_number[is.na(comb$Exp_number)] <- 0 #catch at age catchatage<-comb[,4]*y[,3:13] columntotal<-colSums(catchatage[1:dim(catchatage)[1],1:dim(catchatage)[2]]) #assign and combine age, number of fish at age and year Number=as.data.frame(columntotal) Age=rownames(Number);Age<-as.numeric(gsub("A","",Age)) #get rid of the letter A in age Year=rep(unique(subset_data$Year[i]),length(columntotal)) age<-as.data.frame(cbind(Year,Age,Number=Number[,1])) #reorder age age<-age[order(age[,2]),] return(age) } } myfunction_2(x,y,z)

The error message I receive is:

<blockquote>

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

</blockquote>

So I think my loop must not be subsetting the data by unique year.

Thank you.

Answer1:

There are several thing that should be changed in order for this code to work:

<ol><li>subset_data<-x$Year==y_levels[i] does not actually define a subset, it should rather be subset_data <- subset(x, Year==y_levels[i])</li> <li>in the loop, for (i in length(y_levels)) should be for (i in 1:length(y_levels)), otherwise the loop will only apply to year 2012</li> <li>return(age) is inside the loop when it should be outside</li> <li>the results of each iteration are not combined</li> </ol>

After correcting point 1, it should get much easier to correct the rest.

Another suggestion: your second function should use the first one, the code will be easier to read.

Finally, another way to loop without a "for" statement would be by used an lapply (or rather do.call, see @konvas' comment):

do.call(rbind, lapply(unique(x$Year), function(yy) myfunction(subset(x,Year==yy),y,z) ))

Answer2:

Although your code should work (after fixing the errors suggested by @VincentGuillemot), you can make myfunction a bit more readable, if you want. I think it would be worth to modify your starting data frame, right from the beginning, to include every combination of Length and Year (rather than doing it in myfunction, for one year at a time).

So, since what you are after is all length-year combinations, how about something like

# create a data frame consisting of all length-year combinations data <- expand.grid(Length = z$Length, Year = unique(x$Year)) data <- merge(data, x, all = TRUE) # merge with x data <- merge(data, as.data.frame(y), by.x = "Length", by.y = "len") # merge with y data$Exp_number[is.na(data$Exp_number)] <- 0 # set missing Exp_number values to 0

At this stage your data is in one data frame (instead of 3) and the missing values have been taken care of (except for the column Season, which you don't seem to care about). I find it easier to perform analysis on d (rather than x, y, z) and you can focus on the actual computations rather than on merging data frames and replacing NAs.

Now, your function will look something like

myfunction <- function(d) { # d is a subset of data for a given year catchatage <- d$Exp_number * d[grep("^A[0-9]*$", names(d))] Number <- colSums(catchatage) Age <- as.numeric(gsub("A", "", names(Number))) result <- data.frame(Year = d$Year[1], Age = Age, Number = Number) result[order(result$Age), ] }

I prefer using column names and grep instead of column indices, because using indices can lead to errors that go undetected, if the order/structure of the data changes (on the other hand, the names of the variables rarely change).

It remains to apply myfunction to subsets of data by Year, and combine the results. This can be done in many ways, using lapply as @VincentGuillemot suggested in his post, or by (or several other non-base methonds like plyr, dplyr, data.table if you are interested to look into them)

do.call(rbind, by(data, list(data$Year), myfunction))

Recommend