How to make ML engineers 5x more efficient

Emerging applications like machine learning (ML), big data analytics, and artificial intelligence (AI) has created the need for many companies to hire highly skilled and experienced work force. Demand for data scientists, ML engineers and data engineers is booming and will only increase in the next years. The January report from Indeed, one of the top job sites, showed a 29% increase in demand for data scientists year over year and a 344% increase since 2013.

Salaries and OpEx

At the same time the personnel cost of this skilled employees is increasing rapidly. According to Indeed, the average salary for a machine learning engineer is about $145,000 per year. According to Glassdoor, a data scientist role with a median salary of $110,000 is now the hottest job in America [1].

As the demand for data scientists and machine learning engineers grows, you can also expect these numbers to rise posing a significant challenge in many companies. Therefore, it is very important for the companies to provide to these engineers the best available tools to help them be more efficient.

ML tasks

Training machine learning models is one of the typical tasks of ML engineers and it is the one that consume significant amount of their time. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selectionmethods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model.

In many cases, automated ML tools can be used to automate the end-to-end process of applying machine learning to real-world problems trying to find automatically the hyperparameter optimization. Auto ML, although very useful takes many hours to complete.

As the time to train the model is very important for the engineers in order to find the optimum solution, it is crucial to be able to run these tasks very fast in order these highly skilled engineers to be efficient and productive.

The rise of specialized accelerators

Typical processors provide high flexibility but the lack of performance. According to David Patterson the Domain Specific accelerators, like FPGAs, is the only path left to keep increasing the performance of computing systems for applications like Machine Learning.

Specialized Accelerators like FPGAs can provide up to 20x speedup compared to typical processors and at the same time are more energy-efficient and cost-efficient than GPUs and CPUs. That’s why, cloud providers like AWS, Alibaba and Huawei have started deploying FPGAs in their data centers that are available to the public.

Use case: Training on logistic regression — 15x more models

In this use case, we show how ML engineers can be more efficient and more productive for a typical real world example using FPGA-based accelerators on the cloud. The training of logistic regression for the large MNIST dataset(libsvm format) takes around 18.7 minutes in a typical processor on aws with 16 cores and the cost is $1.15/hour. On the other hand, the ML engineer can train the same model in just 1.2 minutes (15x faster) using the FPGA-accelerated instances (f1.4x) that costs $3.3/hour. That means that the ML engineer can train 15x more models without any changes in his code.

Cost saving

The use of FPGA-based accelerators has several benefits also in terms of cost. On a yearly basis the use of 10 servers (r5d.4x: $1.15/hour) for the training costs around $24k (assuming 8 hours per day for 262 working days). Using the f1 instances the costs drops to $8.9k ($3.3/hour for the f1 instances and $3/hour for the InAccel accelerated ML suite). That means more than 2.5x cost savings.

At the same time, ML engineers can be 15x more productive as they can test 15x more models at the same time. Assuming a salary of $145k per year and assuming that training takes 33% of his/her time that means the productivity using accelerators can increase by a 5x. A group of 5 ML engineers costs $725k in the company. By using the hardware accelerators, the ML team can be more productive and the company can save more than 580k/year in salaries.

How to make ML engineers 5x more efficient

Leave a Reply Cancel reply