Data scientists, you can’t not-know APIs & Dockers

We must have read tons of literature on developing analytics models and finding information/insights out of data blob. Majorly this activity can run stand alone on your system, without any compulsion to interact with other machine. But, as the businesses are facing ever rising competition, the surmounting cost of maintaining engagement and the negative slope of realized profits, productization is not any buzz-word which is yet to see the light of day. It is happening and it is here to stay. This article from Forbes highlights how some of the giants like Salesforce and Uber are leveraging it.

As a data analyst/scientist, you may well be expected to create a self-service product or web application out of your clustering or recommender solution. If you are a product manager keen to augment your existing product with Amazon, Spotify and Netflix like features of recommendations – linking statistical models to your application is imperative.

With this context and earnestness, let me introduce you to API – Application Process Interface. APIs are gateways and channels through which two disparate applications/codes/platforms/tools/software can talk and exchange information with each other. API is the instrument to call a function on another computer, from your computer. I know, if you are an application developer, reading this would have already brought a familiar smile by now. Yes, APIs are nothing new to developers. But, for budding data scientists, it could be. Unknowingly, you have already used one if you have connected your python/R code with any database to directly ingest the data or have set-up your Tableau dashboard to auto-refresh with Google Analytics data. Like URL responds with a webpage or HTML, and API responds by either sending data or by changing the data in the database.

Both R and Python offer multiple preexisting API packages like googleAnalyticsR and RSiteCatalyst for R platform and googleanalytics and omniture for Python, to help import data by connecting to respective tools. But, for custom requirement, you would have to write one. Rserve, rApache and Plumber are a few packages in R, which can used for custom APIs. In this article, I will focus on API creation in R through Plumber and followed by brief on Docker setup.

Plumber

In R, Plumber package can be used to create an API. Plumber allows you to create APIs by merely decorating existing R code with special annotations, starting #* or #’.

The following code will have two parts. First, creation of model and second creation of an endpoint (an endpoint is logic that generates a response to the API request). @get, @post, @put, @delete, @head are 5 plumber annotations, which tell to call the R function if the server receives a HTTP request with the respective path.

Sample code skeleton:

Note: The model creation can be separately saved in an .rds file, which can be called in prediction code file but I have combine them in one.

—————————————————————————————

#save the file as sample_model.R
# creating regression model on iris dataset
irisData <- iris
reg_model <- lm(Petal.Length ~ Petal.Width, data = irisData)
# Now that the model is created, utilize this prediction model through a browser based input of Petal width. This is also called defining ‘endpoint’. Since we are taking in input, GET will be useful.
#* @get /petal_length_prediction
predict_length <- function(width_of_petal){
# convert to a number
width_of_petal <- as.numeric(width_of_petal)
# input data frame for prediction
input_data <- data.frame(Petal.Width=as.numeric(width_of_petal))
# prediction
predict(model,input_data)
}

—————————————————————————————

Use a separate code file for plumb() function, which will translate the previous R file sample_model.R to service. This code will accept HTTP request and transform it into executable R code.

—————————————————————————————

# save the file as plumb.R

library(plumber)

new_plumb <- plumb(“sample_model.R”)

#Port 80 is the default port from which a computer sends/receives Web client-based communication, messages from a Web server and to send/receive HTML pages or data

# 0.0.0.0 is a ‘no particular address’ place holder. In the context of servers, it means all IPv4 addresses on the local machine

#start the API service on port 80

new_plumb$run(port=80, host=”0.0.0.0″)

—————————————————————————————

Once you run plumb.R file, you should see

> `Starting server to listen on port 80`

Now open browser, and to reach API, hit:

http://127.0.0.1/ petal_length_prediction? width_of_petal =1

VOILA!

The browser should show ‘3.3135’ as the predicted petal length.

This method helped you create your API in R, but won’t be very useful unless one is interested to use your system as server, always keep port 80 open and allow all incoming requests.

Hence, enter Docker: which would help to host this code on a server machine.

DOCKER

The logo of Docker gives a good overview of what it is. Consider a server machine as merchant ship. A merchant ship is expected to transport hundreds and thousands of products. Containers remove this complexity by bringing uniformity. From outside, they are all same providing uniformity in storage and transportation but inside, they all carry different products and serve their respective purpose. A Docker is synonymous to a container box. Like a merchant ship, now a Linux server can host tens of applications, irrespective of their tech requirements.

‘A Docker is a virtualization tool, that helps developers to easily pack, ship, and run any application as a lightweight, portable, self-sufficient container, which can run virtually anywhere.’ It is a standard unit of software that packages up code and all its dependencies (classes, libraries, CSS etc.) so the application runs quickly and reliably from a computing environment, and can be transported to another.

To create a Docker container, Docker image is needed. As per Techtarget, A Docker image includes the elements needed to run an application as a container — such as code, config files, environment variables, libraries and run time. If the image is deployed to a Docker environment it can then be executed as a Docker container. The Docker run command will create a container from the given image. Docker images are also a reusable asset that can be deployed on any host.

For initiation on the path of Docker creation, first one would have to install it on system. Use the following mac and windows links to install it. Since its beginners level, I shall focus on using the already created and managed Docker image to create container. One can find some preconfigured Docker images, optimum for our task here – the Rocker project and hub.docker. Here r-ver and trestletech/plumber are nice to start with. The image we shall use is trestletech/plumber. This image has Plumber preinstalled. So, even if you don’t have R preinstalled on your system, no worries.

So, once the Docker is installed, open the PowerShell (windows) and run the following command. So, once the Docker is installed, open the PowerShell (windows) and run the following command

—————————————————————————————

#this command would create a container
docker pull trestletech/plumber

—————————————————————————————

Now that the container is created, we would need to copy our R files, set API controller and create bridge with Port 80. Hence, to supplement this Docker image, we would need a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

—————————————————————————————

#example Dockerfile
FROM trestletech/plumber #specifying base container
RUN R -e ‘install.packages(c(“package_name”))’ #if any additional package needs to be included
#copy the analytics model and scoring R scripts in the specified directory /newdir
RUN mkdir /newdir
COPY sample_model.R /newdir
COPY plumb.r /newdir
WORKDIR /newdir
#now expose Port 80 for traffic
EXPOSE 80
#now we specify what happens when the container is started. Hence, we set the plumb.r file as the entrypoint of container
ENTRYPOINT [“Rscript”,”plumb.r”]

—————————————————————————————

To build the image using this Dockerfile, input the following command in Powershell. Refer this page for more info on Dockerfile build

—————————————————————————————

docker build -t demoplumb .
#(here . represents the current directory – the directory where that Dockerfile is stored)

—————————————————————————————

Once this process finishes, the container can be run by following command:

—————————————————————————————

docker run –rm -p 80:80 demoplumb

—————————————————————————————

Following command stops the container

—————————————————————————————

docker stop $(docker ps -a -q)

—————————————————————————————

If you want to test the o/p, go to browser window and type

http://127.0.0.1/ petal_length_prediction? width_of_petal =1

The expected o/p should be ‘3.3135’, as the predicted petal length.

Even now this whole setup is local to your machine. If you want everyone on internet to access your API, you would have to install this Dockers container on webserver like AWS – Amazon Elastic Container Service (ECS) OR Google – Google Kubernetes Engine (GKE).

Data scientists, you can’t not-know APIs & Dockers

Leave a Reply Cancel reply