Home » Programming Languages » R

A Data Scientist’s Perspective on Microsoft R

This article was written by Lixun Zhang.

As a data scientist, I have experience with R. Naturally, when I was first exposed to Microsoft R Open (MRO, formerly Revolution R Open) and Microsoft R Server (MRS, formerly Revolution R Enterprise), I wanted to know the answers for 3 questions:

  • What do R, MRO, and MRS have in common?
  • What’s new in MRO and MRS compared with R?
  • Why should I use MRO or MRS instead of R?

The publicly available information on MRS either describes it at a high level or explains the specific functions and the underlying algorithms. When they compare R, MRO, and MRS, the materials tend to be high level without many details at the functions and packages level, with which data scientists are most familiar. And they don’t answer the above questions in a comprehensive way. So I designed my own tests (and the code behind the tests is available on GitHub). Below are my answers to the three questions above. MRO has an optional MKL library and unless noted otherwise the observations hold true, whether MKL is installed on MRO or not.

6a010534b1db25970b01b8d1d272f8970c-pi

What do R, MRO, and MRS have in common?

After installing R, MRO, and MRS, you’ll notice that everything you can do in R can be done in MRO or MRS. For example, you can use glm() to fit a logistic regression and kmeans() to carry out cluster analysis. As another example, you can install packages from CRAN. In fact, a package installed in R can be used in MRO or MRS and vice versa if the package is installed in a library tree that’s shared among them. You can use the command .libPaths() to set and get library trees for R, MRO and MRS. Finally, you can use your favorite IDEs such as RStudio and Visual Studio with RTVS for R, MRO or MRS. In other words, MRO and MRS are 100% compatible with R in terms of functions, packages, and IDEs.

What’s new in MRO and MRS compared with R?

While everything you do in R can done in MRO and MRS, the reverse is not true, due to the additional components in MRO and MRS. MRO allows users to install an optional math library MKL for multithreaded performance. This library shows up as a package named “RevoUtilsMath” in MRO.

MRS comes with more packages and functions than R. From the package perspective, most of the additional ones are not on CRAN and are available only after installing MRS. One such example is the RevoScaleR package. MRS also installs the MKL library by default. As for functions, MRS has High Performance Analysis (HPA) version of many base R functions, which are included in the RevoScaleR package. For example, the HPA version of glm() is rxGlm() and for kmeans() it is rxKmeans(). These HPA functions can be used in the same way as their base R counterparts with additional options. In addition, these functions can work with a special data format (XDF) that’s customized for MRS.

To read the rest of the article, click here.

Tags: