Normally, it is better to avoid loops in R. But for highly individual tasks a vectorization is not always possible. Hence, a loop is needed – if the problem is decomposable.

Which different kinds of loops exist in R and which one to use in which situation?

In each programming language, for- and while-loops (sometimes until-loops) exist. These loops are sequential and not that fast – in R.

for(i in x)








Even for prototyping sometimes too slow.

But how to improve speed?

There are three options in R:

  1. apply loops
  2. parallelization
  3. RCPP

apply loops:

Normally, you can use apply for calculating some standard statistics of the columns, the rows, or both. But you can use a trick to adjust the apply order for a loop. The syntax is:

F <- function(i, x, y, z,…)


apply(as.data.frame(1:length(vector)), margin = 1, FUN = F)

In this case you use the vector not for direct calculation but as an index “i” instead.

The sapply order is even faster.

F <- function(i, x, y, z,…)


sapply(1:length(vector), FUN = F)


You can use loops and apply orders also in parallel. You need:





Firstly defining the number of cores. Leave out at least one:


NumOfCores <- detectCores() - 1



Either using a loop:


foreach::foreach(x = 1:length(vector), .combine = rbind, .inorder = T, .multicombine = F) %dopar%



This loop creates a vector of results.

If the order is not important you can increase performance by .inorder = F. This means that a free processor takes the next iteration independent from the sequence of the iterations.


Or using a parSapply order:


clusters <- makeCluster(NumOfCores)

parSapply(cl = clusters, X = 1:length(vector), FUN = F, x = x, y = y, z = z,… )


In this case it is important to integrate the data in reference within the parentheses – you cannot directly connect to the workspace like in the ordinary sapply order.




Firstly you need to install RTools.




define a function in C++, create a shared library and compile the code.


#include <Rcpp.h>

using namespace Rcpp;


// [[Rcpp::export]]

double NameOfFunction (NumericVector Vector)



Then you can call it in R:


sapply(X = 1:length(testVec), FUN = NameOfFunction, y = Vector)


But when to use which kind of loop?


Judging from the experience, I recommend to make the decision dependent from the number of iterations and the costs of each iteration.



Not costly


Low number of iterations

for-loop, while-loop

RCPP, foreach

Large number of iterations

RCPP, sapply, apply, lapply, for-loop, while-loop

parSapply, RCPP



Views: 2044

Tags: loop


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service