Subscribe to DSC Newsletter

Normally, it is better to avoid loops in R. But for highly individual tasks a vectorization is not always possible. Hence, a loop is needed – if the problem is decomposable.

Which different kinds of loops exist in R and which one to use in which situation?

In each programming language, for- and while-loops (sometimes until-loops) exist. These loops are sequential and not that fast – in R.

for(i in x)

{task}

 

i=y

while(i<=x)

{task

i=i+1}

 

Even for prototyping sometimes too slow.

But how to improve speed?

There are three options in R:

  1. apply loops
  2. parallelization
  3. RCPP

apply loops:

Normally, you can use apply for calculating some standard statistics of the columns, the rows, or both. But you can use a trick to adjust the apply order for a loop. The syntax is:

F <- function(i, x, y, z,…)

{task}

apply(as.data.frame(1:length(vector)), margin = 1, FUN = F)

In this case you use the vector not for direct calculation but as an index “i” instead.

The sapply order is even faster.

F <- function(i, x, y, z,…)

{task}

sapply(1:length(vector), FUN = F)

Parallelization:

You can use loops and apply orders also in parallel. You need:

library("doParallel")

library("parallel")

library("foreach")

 

Firstly defining the number of cores. Leave out at least one:

 

NumOfCores <- detectCores() - 1

registerDoParallel(NumOfCores)

 

Either using a loop:

 

foreach::foreach(x = 1:length(vector), .combine = rbind, .inorder = T, .multicombine = F) %dopar%

{task}

 

This loop creates a vector of results.

If the order is not important you can increase performance by .inorder = F. This means that a free processor takes the next iteration independent from the sequence of the iterations.

 

Or using a parSapply order:

 

clusters <- makeCluster(NumOfCores)

parSapply(cl = clusters, X = 1:length(vector), FUN = F, x = x, y = y, z = z,… )

 

In this case it is important to integrate the data in reference within the parentheses – you cannot directly connect to the workspace like in the ordinary sapply order.

 

RCPP:

 

Firstly you need to install RTools.

 

library("Rcpp")

 

define a function in C++, create a shared library and compile the code.

 

#include <Rcpp.h>

using namespace Rcpp;

                                  

// [[Rcpp::export]]

double NameOfFunction (NumericVector Vector)

{task}

 

Then you can call it in R:

 

sapply(X = 1:length(testVec), FUN = NameOfFunction, y = Vector)

 

But when to use which kind of loop?

 

Judging from the experience, I recommend to make the decision dependent from the number of iterations and the costs of each iteration.

 

 

Not costly

costly

Low number of iterations

for-loop, while-loop

RCPP, foreach

Large number of iterations

RCPP, sapply, apply, lapply, for-loop, while-loop

parSapply, RCPP

 

 

Views: 1647

Tags: loop

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service