Kubernetes is a great system for handling clusters of containers (whether on cloud or on-premise), but deploying and managing containerized applications for ML can be a challenging task.
Kubeflow is known as a machine learning toolkit for Kubernetes. It is an open source project used for making deployments of machine learning workflows on Kubernetes simple, portable, and scalable. It is used by data scientists and ML engineers who want to build, experiment, test and serve their ML workloads to various environments.
Some of the main components that make Kubeflow useful include:
Until now, the Kubeflow community has presented applications on CPUs or GPUs. FPGAs can be used to speedup ML applications but so far the integration and the deployment was hard.
InAccel FPGA manager makes much easier the deployment and integration of FPGAs to higher programming frameworks. With InAccel’s FPGA Kubernetes plugin, the applications can be easily accelerated without worrying about resource management and utilization of the FPGA cluster.
A complete guide on how to set up a complete machine learning application using FPGAs with Kubeflow on any existing Kubernetes cluster, is provided on this Tutorial Labs.
Hyperparameter tuning is the process of optimizing the hyperparameter values to maximize the predictive accuracy of the model. If you don’t use Katib or a similar system for hyperparameter tuning, you need run many training jobs yourself, manually adjusting the hyperparameters to find the optimal values.
Searching for the best parameters takes away important time from other stages of the Data Science lifecycle. So, tools that monitor and automate this repetitive training process do not suffice and need to be accelerated, in order to let the professionals concentrate a bit more on stages like business understanding, data mining etc.
XGBoost is a powerful machine learning library that has recently been dominating applied machine learning and is quite easy to build a predictive model. But, improving the model is difficult due to the multiple parameters and requires careful hyperparameter tuning to fully leverage its advantages over other algorithms.
InAccel released in the past the IP core for accelerated XGBoost on FPGAs. This IP core helped demonstrate the advantages of the FPGAs in the domain of ML and offered to the data science community the chance to experiment, deploy and utilize FPGAs in order to speedup their ML workloads. With Python, Java and Scala APIs provided, the engineers do not need to change their code at all or worry about configuring FPGAs.
Concerning Katib now, there are three steps in order to run your own experiment.
SVHN is a real-world image dataset, obtained from house numbers in Google Street View images. It consists of 99289 samples and 10 classes, with every sample being a 32-by-32 RGB image (3072 features).
The training code for the step 1 can be found on GitHub and is included inside inaccel/jupyter:scipy Docker image.
After defining the parameters, the search algorithm, the metrics and the other trial specifications we create a TrialTemplate YAML. In this file we:
For a CPU-only implementation we just need to change the tree_method to exact, hist etc.
Finally, we submit the experiment and navigate to the monitor screen.
In the above plots we see the objective metrics, accuracy and time, along with the three hyperparameters we chose to tune. We can keep the best combination of them, take more info or retry with another experiment. We notice that the accuracy is the same on both executions, but the CPU-only training takes 1100 seconds on average, while the FPGA-accelerated one lasts only 245 seconds. This means, that InAccel XGBoost achieves up to 4.5x speedup on this use case.
You will find a step-by-step tutorial here.
The following video also presents a complete walkthrough on how to submit a new experiment using Katib and highlights the extra steps needed for the FPGA deployment along with a small comparison of CPU and FPGA execution times.
Authors:
Vangelis Gkiastas
ML Engineer
Copyright: InAccel, Inc.
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central