It's been few months since I started using docker containers for my data science experiments. Docker Containers are becoming very popular these days for reproducible experiments.
I bought a new laptop during Thanksgiving shopping. It was a good deal for a Windows laptop ( Windows 10 Pro). I started trying to setup Docker in Windows and soon realized that it will be a long learning curve. Fortunately for me, my old laptop was still in good condition and had 16 GB of RAM. I thought that would make an excellent server for my use case and I could open few ports and access it from anywhere. I do most of my data science work using R and wanted to try Python and practice it as much as possible. This struck me as a perfect platform.
Below are the steps I followed to get the initial setup up and running. I have to mention that setting up docker was super
easy in linux and is not fair to be compared to the Windows experience.
* Installed Ubuntu 16.04.1 - 64 Bit
* Refer Docker Documentation - [Docker Docs](
* Add relevant key to the key-chain
sudo apt-get install apt-transport-https ca-certificates
sudo apt-key adv \
* Add the docker repository for my Ubuntu version
echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" | sudo tee
* Update APT package index and verify it is using the right repository.
sudo apt-get update
apt-cache policy docker-engine
* Install Docker
sudo apt-get install docker-engine
* Start Docker
sudo service docker start
* Run the docker “Hello World” to test.
sudo docker run hello-world
* If you want to run docker commands without using sudo then add your user id to the docker group
sudo groupadd docker
sudo usermod -aG docker renjith
docker run hello-world
* Configure Docker to start on system boot.
sudo systemctl enable docker
Ok, we are done with setting up and installing Docker. Let’s move on to Data Science.
The Magic Commands
Kaggle has to be thanked for setting up these Docker images that can be downloaded and run in just few minutes.
There are 3 Kaggle Docker images that I am aware of:
2. kaggle/rstats and
Since I am more interested Python and R, I have downloaded only those. However the steps remain the same for Julia. The commands are slightly different than what is given in Kaggle but I made that changes to work in Ubuntu. Below commands work perfectly in Ubuntu 16.04 and mac users can refer the commands given by Kaggle in the above link.
sudo docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python \
jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working
sudo docker run -it -p 8787:8787 --rm -v $PWD:/tmp/working kaggle/rstats /bin/bash \
-c "rstudio-server restart & /bin/bash"
Both Python Notebook and Rstudio Web Interface can be opened using the below url’s in a web browser.
http://<your machine IP>:8888
http://<your machine IP>:8787
Python and IPython consoles can be opened directly as below.
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python python "[email protected]"
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python ipython
The cream of all this is that you can try all those amazing kernels in Kaggle data science problems using this docker without any errors. If you are someone like me who have struggled maintaining all the Python and R packages and is into a lot of experiments, I am sure you will admire how cool a docker platform is.