Big Data as a Service, get easily running a Cloudera Quickstart Image with Dockers in GCP

It’s not a secret that containers technology (popularly known as dockers) is becoming one of the top choices in software projects [1], but What about data projects/clusters? Many companies and projects have intentions to take advantages of it. Some examples are Cloudera [2] and the apache-spark-on-k8s project [3], personally, I suggest if you want more information as what exactly is called “Big Data as a Service” to check the last Strata Data Conference [4] of Anant Chintamaneni and Nanda Vijaydev (BlueData).

In this article, I will guide you with simple steps in order to get a Cloudera Quickstart Images v5.13 running remotely in a Google Cloud instance. Well, get the job done!


1. Have a Google Cloud account (Just log in with your Gmail and automatically get $300 of credit for one year) [5]

2. Create a new project

Let's start

  1. First, create a VM instance

2. Define basic tech specs (important to allow HTTP y HTTPS traffic)

3. Connect using SSH

4. Install docker

curl -sSL https://get.docker.com/ | sh

5. Update the package database with the Docker package

sudo apt-get update

6. Get the Cloudera Quickstart Image

sudo wget https://downloads.cloudera.com/demo_vm/docker/cloudera-quickstart-v...

7. Extract the tar file

tar xzf cloudera-quickstart-vm-*-docker.tar.gz

8. Import the docker *maybe you could run out of space, in that case, remove the tar.gz file an re-run the import

sudo docker import cloudera-quickstart-vm-5.13.0–0-beta-docker.tar

9. Check the container image ID

sudo docker images

10. Run the container

sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart

Let’s do some explanation about the parameters [7]

· sudo docker run: main command to start the docker
· — hostname: Pseudo-distributed configuration assumes this hostname
· — privileged=true: Required for HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry and Cloudera Manager
· -t: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.
· -i: If you want to use the terminal, either immediately or connect to the terminal later.
· -p 8777:8888: Map the Hue port in the guest to another port on the host.
· b46c7719892d: Docker images ID obtained from step 9

11. Test the services




Hue (port 8777)**

  • *In order to access first you have to allow the ports you defined in step 10. For security try to open just those ports, in the image I opened all.

User and password cloudera

Hue UI!

Cloudera running (port 90)

11. Exit the container

Just type Ctrl+d

Go further

You can run in the background with this code, because if you do not pass the -d flag to docker run your terminal automatically attaches to the container

sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart -d

If you want to reconnect to the shell (to stop just type Ctrl+d)

sudo docker ps 256e31278a92
sudo docker attach 256e31278a92


In this article, I show how easy is to start using the Cloudera Quickstart Image using dockers.

See you in the next article! Happy Learning!



[1] https://www.theserverside.com/feature/The-benefits-of-container-dev...

[2] http://community.cloudera.com/t5/CDH-Manual-Installation/CDH-on-Kub...

[3] https://github.com/apache-spark-on-k8s/spark

[4] https://conferences.oreilly.com/strata/strata-ny-2018/public/schedu...

[5] https://cloud.google.com

[6] https://www.digitalocean.com/community/tutorials/how-to-install-and...

[7] https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quic...

Views: 1473

Tags: Apache Spark, Big Data, Cloudera, Dockers, Google Cloud


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service