Normally, it is better to avoid loops in R. But for highly individual tasks a vectorization is not always possible. Hence, a loop is needed – if the problem is decomposable.
Which different kinds of loops exist in R and which one to use in which situation?
In each programming language, for and whileloops (sometimes untilloops) exist. These loops are sequential and not that fast – in R.
for(i in x)
{task}
i=y
while(i<=x)
{task
i=i+1}
Even for prototyping sometimes too slow.
But how to improve speed?
There are three options in R:
apply loops:
Normally, you can use apply for calculating some standard statistics of the columns, the rows, or both. But you can use a trick to adjust the apply order for a loop. The syntax is:
F < function(i, x, y, z,…)
{task}
apply(as.data.frame(1:length(vector)), margin = 1, FUN = F)
In this case you use the vector not for direct calculation but as an index “i” instead.
The sapply order is even faster.
F < function(i, x, y, z,…)
{task}
sapply(1:length(vector), FUN = F)
Parallelization:
You can use loops and apply orders also in parallel. You need:
library("doParallel")
library("parallel")
library("foreach")
Firstly defining the number of cores. Leave out at least one:
NumOfCores < detectCores()  1
registerDoParallel(NumOfCores)
Either using a loop:
foreach::foreach(x = 1:length(vector), .combine = rbind, .inorder = T, .multicombine = F) %dopar%
{task}
This loop creates a vector of results.
If the order is not important you can increase performance by .inorder = F. This means that a free processor takes the next iteration independent from the sequence of the iterations.
Or using a parSapply order:
clusters < makeCluster(NumOfCores)
parSapply(cl = clusters, X = 1:length(vector), FUN = F, x = x, y = y, z = z,… )
In this case it is important to integrate the data in reference within the parentheses – you cannot directly connect to the workspace like in the ordinary sapply order.
RCPP:
Firstly you need to install RTools.
library("Rcpp")
define a function in C++, create a shared library and compile the code.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double NameOfFunction (NumericVector Vector)
{task}
Then you can call it in R:
sapply(X = 1:length(testVec), FUN = NameOfFunction, y = Vector)
But when to use which kind of loop?
Judging from the experience, I recommend to make the decision dependent from the number of iterations and the costs of each iteration.

Not costly 
costly 
Low number of iterations 
forloop, whileloop 
RCPP, foreach 
Large number of iterations 
RCPP, sapply, apply, lapply, forloop, whileloop 
parSapply, RCPP 
Posted 10 May 2021
© 2021 TechTarget, Inc. Powered by
Badges  Report an Issue  Privacy Policy  Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 20082014  20152016  20172019  Book 1  Book 2  More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central