Home » Uncategorized

Big Data as a Service, get easily running a Cloudera Quickstart Image with Dockers in GCP

It’s not a secret that containers technology (popularly known as dockers) is becoming one of the top choices in software projects [1], but What about data projects/clusters? Many companies and projects have intentions to take advantages of it. Some examples are Cloudera [2] and the apache-spark-on-k8s project [3], personally, I suggest if you want more information as what exactly is called “Big Data as a Service” to check the last Strata Data Conference [4] of Anant Chintamaneni and Nanda Vijaydev (BlueData).

In this article, I will guide you with simple steps in order to get a Cloudera Quickstart Images v5.13 running remotely in a Google Cloud instance. Well, get the job done!

Prerequisites

1. Have a Google Cloud account (Just log in with your Gmail and automatically get $300 of credit for one year) [5]

2. Create a new project

1f6AW3WMUvKC8hggg5VlhdALet’s start

  1. First, create a VM instance

1JwdGKcisieW_d1Neai8jGA

2. Define basic tech specs (important to allow HTTP y HTTPS traffic)

1Kk7wfnND9F96rAhmyA8PKg

3. Connect using SSH

1fpLsJD21q5sNoD5GwjnUqw

4. Install docker

curl -sSL https://get.docker.com/ | sh

5. Update the package database with the Docker package

sudo apt-get update

1tUzhj5whR95LMIAstFlF8Q

6. Get the Cloudera Quickstart Image

sudo wget https://downloads.cloudera.com/demo_vm/docker/cloudera-quickstart-v…

1cvc8DQQpCeuAmNYduC2ePA

7. Extract the tar file

tar xzf cloudera-quickstart-vm-*-docker.tar.gz

1FYlbdm0vwHp-HEAJOQWQTw

8. Import the docker *maybe you could run out of space, in that case, remove the tar.gz file an re-run the import

sudo docker import cloudera-quickstart-vm-5.13.0–0-beta-docker.tar

1JCecmqe9-bnYjqw3Gg7jpw

9. Check the container image ID

sudo docker images

1RRcBB6OYo5RrImWfEHbdPQ

10. Run the container

sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart

1hOkTlcwOp6HWt1xiBmLLnQ

Let’s do some explanation about the parameters [7]

· sudo docker run: main command to start the docker

· — hostname: Pseudo-distributed configuration assumes this hostname

· — privileged=true: Required for HBase, MySQL-backed Hive metastore, Hue, Oozie, Sentry and Cloudera Manager

· -t: Allocate a pseudoterminal. Once services are started, a Bash shell takes over. This switch starts a terminal emulator to run the services.

· -i: If you want to use the terminal, either immediately or connect to the terminal later.

· -p 8777:8888: Map the Hue port in the guest to another port on the host.

· b46c7719892d: Docker images ID obtained from step 9

11. Test the services

Spark

1Q5391yvjrnfprN3kFRbFVw

Hive

15W1_0hKiQxzQDbqtspez5w

HBase

1EJ2pJDcvoFEJnBMUEzwonQ

Hue (port 8777)**

1b3OCSUYeOLnflqJgLYgPkA

  • *In order to access first you have to allow the ports you defined in step 10. For security try to open just those ports, in the image I opened all.

User and password cloudera

1sSf66qy25fX2OThwkPq6Tg

Hue UI!

1j_GKuirMWRZa8BrEKWIK5Q

Cloudera running (port 90)

1l2akFsxjwitLGoO_oVJFlg

11. Exit the container

Just type Ctrl+d

1oKbADwYbugwufyqCHhFLRw

Go further

You can run in the background with this code, because if you do not pass the -d flag to docker run your terminal automatically attaches to the container

sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8777:8888 -p 7190:7180 -p 90:80 b46c7719892d /usr/bin/docker-quickstart -d

If you want to reconnect to the shell (to stop just type Ctrl+d)

sudo docker ps 256e31278a92
sudo docker attach 256e31278a92
1Klt3R_AGr2UQQHmYPqE9cg

Conclusion

In this article, I show how easy is to start using the Cloudera Quickstart Image using dockers.

See you in the next article! Happy Learning!

Code:

Links:

[1] https://www.theserverside.com/feature/The-benefits-of-container-dev…

[2] http://community.cloudera.com/t5/CDH-Manual-Installation/CDH-on-Kub…

[3] https://github.com/apache-spark-on-k8s/spark

[4] https://conferences.oreilly.com/strata/strata-ny-2018/public/schedu…

[5] https://cloud.google.com

[6] https://www.digitalocean.com/community/tutorials/how-to-install-and…

[7] https://www.cloudera.com/documentation/enterprise/5-6-x/topics/quic…