Subscribe to DSC Newsletter

It's been few months since I started using docker containers for my data science experiments. Docker Containers are becoming very popular these days for reproducible experiments. 

The Motivation

I bought a new laptop during Thanksgiving shopping. It was a good deal for a Windows laptop ( Windows 10 Pro). I started trying to setup Docker in Windows and soon realized that it will be a long learning curve. Fortunately for me, my old laptop was still in good condition and had 16 GB of RAM. I thought that would make an excellent server for my use case and I could open few ports and access it from anywhere. I do most of my data science work using R and wanted to try Python and practice it as much as possible. This struck me as a perfect platform.

Initial Setup

Below are the steps I followed to get the initial setup up and running. I have to mention that setting up docker was super
easy in linux and is not fair to be compared to the Windows experience.

* Installed Ubuntu 16.04.1 - 64 Bit

* Refer Docker Documentation - [Docker Docs](

* Add relevant key to the key-chain

sudo apt-get install apt-transport-https ca-certificates
sudo apt-key adv \
--keyserver hkp://ha.pool.sks-keyservers.net:80
--recv-keys 58118E89F3A912897C070ADBF76221572C52609D

* Add the docker repository for my Ubuntu version

echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" | sudo tee

* Update APT package index and verify it is using the right repository.

sudo apt-get update
apt-cache policy docker-engine

* Install Docker

sudo apt-get install docker-engine

* Start Docker

sudo service docker start

* Run the docker “Hello World” to test.

sudo docker run hello-world

* If you want to run docker commands without using sudo then add your user id to the docker group

sudo groupadd docker
sudo usermod -aG docker renjith
docker run hello-world

* Configure Docker to start on system boot.

sudo systemctl enable docker

Ok, we are done with setting up and installing Docker. Let’s move on to Data Science.

The Magic Commands

Kaggle has to be thanked for setting up these Docker images that can be downloaded and run in just few minutes.

There are 3 Kaggle Docker images that I am aware of:

1. kaggle/python
2. kaggle/rstats and
3. kaggle/julia

Since I am more interested Python and R, I have downloaded only those. However the steps remain the same for Julia. The commands are slightly different than what is given in Kaggle but I made that changes to work in Ubuntu. Below commands work perfectly in Ubuntu 16.04 and mac users can refer the commands given by Kaggle in the above link.

sudo docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python \
jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working

sudo docker run -it -p 8787:8787 --rm -v $PWD:/tmp/working kaggle/rstats /bin/bash \
-c "rstudio-server restart & /bin/bash"

Both Python Notebook and Rstudio Web Interface can be opened using the below url’s in a web browser.

http://<your machine IP>:8888
http://<your machine IP>:8787

Python and IPython consoles can be opened directly as below.

docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python python "[email protected]"
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python ipython

Final Notes

The cream of all this is that you can try all those amazing kernels in Kaggle data science problems using this docker without any errors. If you are someone like me who have struggled maintaining all the Python and R packages and is into a lot of experiments, I am sure you will admire how cool a docker platform is.

Views: 1750

Tags: Data, Science, docker, kaggle

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service