## R is driving me loopy! – A brief blog on loops in R.

The etymology of R can be traced back to mean a few things… R could have originated from: “AARRRRR!!!” Damn you R I hate you R you and your apostrophes in the wrong place. Or “Aaaaaaa” I’m not sure how I achieved that result? But, I’m a little chuffed by my nifty bit of code work.

I’ve getting a few more Aaaaaa’s than ARARRARRRR’s of late so I thought it’d be good to pass on some of my wisdom. One great part of any computer language is the ability to batch tasks that would normally involve a brain numbing amount of mouse clicking of command prompt typing.

In R there are a number of ways to get a loop going.

The most basic loop can be achieved using for:

```for (i in 1:10)
{ mat=matrix(c(i,i))
print(mat)
}
```

The above loop will produce a two column matrix that contains the values one to ten for increasing with each row.

If you wanted you could turn the above loop into a function. That’s right! You can easily create your own functions in R.

```Matrix.Gen <- function(x)

{

for (i in 1:10)

{ mat=matrix(c(i,i))

print(mat)

}

}

Matrix.Gen ()
```

To run your own function you just type in Matrix.Gen () into your R command line. When creating a loop in R it is
important to make sure you understand how the loops are subset in your function. This is especially important when
you have more than one loop.There are two schools of thought when sub setting code one where you just make sure
you close a bracket once you open it.

Eg:

```Matrix.Gen <- function(x) {

for (i in 1:10) {

mat=matrix(c(i,i))

print(mat)  }

}
```

I personally don’t like this method as if you lose a bracket it can get really confusing! I prefer making sure that my brackets line up and you indent the code as you introduce a loop.

Eg:

```Matrix.Gen <- function(x)

{  for (i in 1:10)
{   mat=matrix(c(i,i))
print(mat)
}
}
```

If you set the open and close brackets in a line then they are a lot easier to track. I also like to place my brackets under the function or for loop so you can easily link the appropriate bracket to loop.

A good use for a loop in r is creating a function that runs through your model fitting or model selection. Using a loop to do this can speed up the process of manually entering a single model at a time. The code I will display below run an all model selection method for 249 individual species. This bit of code runs two loops, often referred to as a nested loop. A nested loop is a loop within a loop and this is where you need to start taking care with your bracketing, otherwise it can get really confusing.

Import data as csv files. There are three files that will be called into this loop. Environmental data, ##Species Data and the models I want to fit

```Env_AusNZ360 = read.csv ("EnvPoints10m360lon.csv)
Oph_spp = read.csv ("Spp20_Oph_AusNZ.csv")
mods = read.csv("PPM_mod_select_lin_poly.csv")
```

This next bit of code creates a factor R can recognize based on my species data. I do this so I can get R to call in a single species at a time. These individual species will be called in the first loop (the j loop).

```Oph_spp = Oph_spp[,c(1,2,3,4,5,6,7,9,11,12)]
WhichSpp = factor(Oph_spp\$Spp, exclude = NA)
classes = levels(WhichSpp)
nClasses = length(classes)
```
```fit.ppm <- function(x)
{ for (j in 1:nClasses)
{ print(classes[j])
dat2add <- data.frame(who = Oph_spp[,1], Oph_spp[,2:3], depth = Oph_spp[,5], Oph_spp[,6:9])
for (i in 1:78)
{ ft.ppm0.4 =  ppm(mods[i,2], dat2add[dat2add\$who==classes[j],], Env_AusNZ360, 0.5)
mods[i,2] <- mods[i,2]
mods[i,3] <- classes[j]
mods[i,4] <- ft.ppm0.4\$ll
}
print(mods)
write.table(mods, "C:\\Possion PP\\PPMresults_Spp_10m_05scale.txt",j, sep=".", quote=F)
}
print("Finished")
}
```

So we’re going down a level ( and if you’ve seen inception you’ll get that link) To clarify: the  i loop is nested within the j loop so It’ll run 78 models for the number of species I have. I’ve then closed the i loop and asked it so save my model results to my mods csv file within the j loop. This will mean it’ll append the result of each species as it is solved by R.

So loops can be confusing, but they can greatly speed up your outputs if you get the basics right. Another way to increase the speed of doing calculations in R is to introduce parallelization.  Most modern computers have more than one core or processor which means instead of running a single loop like the above code you can set your computer to paralleling your code. The speed you can increase your loops is based on the number of cores you have. Most modern computers have 4-6, which means if you have a large data set or something that is slow, you can increase the output speed by 4-6 times! I know what you’re thinking that sounds complicated?… In fact that sounds way above my head?… You’d have to be a computer programmer or something like that to get parallelization working? We’ll luckily in R all the hard work has been done.

All you need is two software packages that you can download with ease. I use linux for parallelization, but there are windows versions available.

For linux the packages are:

```install.packages("foreach")
install.packages(“doMC”)
```

For windows:

```install.packages("foreach")
install.packages(“doSNOW”)
```

Then load packages:

```library(“foreach”)
library(“doSNOW”)
```

This tells R how many cores to use, the more cores you use the faster the loop will run, but the harder your machine will have to work.
For the purposes of this example I’m going to set it at four.

```registerDoMC(cores=x)
registerDoMC(cores=4)
```

For windows it’s slightly different:

```cl <- makeCluster(4)
registerDoSNOW(cl)
```

To get R to start paralleling all you need to do is convert your for loops to foreach loops:

```Matrix.Gen <- function(x)
{ for (i in 1:10)
{ mat=matrix(c(i,i))
print(mat)
}
}
```

Will become:

```Matrix.Gen <- function(x)
{ foreach(i=1:10) %dopar%
{ mat=matrix(c(i,i))
print(mat)
}
}
```

Now the foreach call will parallelize your loop based on the number of cores you assign. This should now run 4 times faster!
The same can be done for the nested loop example above.

```fit.ppm <- function(x)
{ foreach(j=1:nClasses) %dopar%
{ print(classes[j])
dat2add <- data.frame(who = Oph_spp[,1], Oph_spp[,2:3], depth = Oph_spp[,5], Oph_spp[,6:9])
foreach(i=1:78) %dopar%
{ ft.ppm0.4 =  ppm(mods[i,2], dat2add[dat2add\$who==classes[j],], Env_AusNZ360, 0.5)
mods[i,2] <- mods[i,2]
mods[i,3] <- classes[j]
mods[i,4] <- ft.ppm0.4\$ll
}
print(mods)
write.table(mods, "C:\\Possion PP\\PPMresults_Spp_10m_05scale.txt",j, sep=".", quote=F)
}
print("Finished")
}
```

I hope this has been useful. Looping and paralleling your code should speed up your analyses. And while your loop is running you can continue to work on other projects or enjoy a cup of tea and a Monte Carlo biscuit. Like I’m going to do now. Until next time. Skip.

Advertisements
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

### 4 Responses to R is driving me loopy! – A brief blog on loops in R.

1. lizmartinresearch says:

Nice post, Skip!

2. thabes says:

Hi there I have read your great post! I have some trouble looping my data with foreach. In stackoverflow frourm there is no response to my question ıf you have something to say please quide me!
http://stackoverflow.com/questions/23682712/foreach-loop-returns-only-result-of-first-data-in-the-list

best regards

• skiptoniam says:

I’ve replied on Stack Overflow.

3. thabes says:

Dear @Skiptoniam, thank you for answer. That’s exactly what I am looking for. I successfully loaded my data and read them. On the other hand, as also in your answer the result (values[m,]) repeat itself every 10 data. In my case I read 70 files from the file directory. Now the results are correct after ‘foreach’ but I have 4900 output ‘values <- matrix(NA,length(data.list),70) foreach(m=1:length(data.list),.combine = "c") %dopar% { values[m,] <- main.fun()}' Also have can I write the output to the .txt file. –