Kaggle's Rachel Tatman on what to do when applying deep learning is overkill

Deep Learning an emerging branch of machine learning, has garnered a lot of recognition in the field of technology over the last decade. It is regarded as a game-changer in AI, with distinct progress in computer vision, natural language processing (NLP), speech and other areas of machine learning. This year an Indeed survey found ‘deep learning engineer’ to be the best job in tech positions in the USA.

Though deep learning has many benefits and a very appealing track record, not everybody can afford deep learning. It has some downsides like large data requirements, being excessively expensive, and has a high computing time. Below is a breakdown of Rachael Tatman’s talk “Put down the deep learning: When not to use neural networks and what to do instead” at the PyCon 2019 conference that delved into the problems with deep learning. Tatman is a data science advocate at Kaggle.

Deep learning models require a very large amount of data in order to perform better than other techniques. Also, according to Tatman, just the compute of a simple image generation model in deep learning can cost around $60,000. This cost will increase with the complexity of the data models. It additionally requires expensive GPUs and hundreds of machines which will again deepen the cost to the user. Many less skilled people also find it difficult to adopt deep learning, as there is no standard theory available for learning about deep learning tools. The choice of a deep learning tool depends on the user’s knowledge of topology, training method, and other parameters. Next, deep learning also takes a lot of time for training large models.

As the talk progresses, Tatman provides a list of three different types of models that can be used instead of deep learning. The three proposed models are regression-based models, tree-based models and distance-based models. Let’s have a brief look at each of them below:

The most interpretable: Regression-based models

The biggest advantage of a regression-based model is that it has a “well-principled” understanding of problems and provides many kinds of regression models, unlike deep learning. Users can simply work through the flowchart and decide on the best type of regression model for their data.

Some other advantages of regression models include its “fast to fit” feature. This means that it is much faster to fit when compared to a neural network, especially “if you’re working with a well-optimized library the Python regression libraries tend to vary wildly so you might want to do a little bit of shopping around”. It also works well with small data as Tatman affirmed that she has worked on eight dozen data points. She added that since regression models are easy to interpret, she was able to learn many useful and interesting things from the data.

A few drawbacks of regression models are that a bit more data preparation is needed than for some other methods. They also require validation as regression models are based on strong assumptions about the distribution of the data points or the distribution of the errors.

Tatman also proclaimed that if she were to use a single machine learning model for the rest of her life, it would be a mixed-effects regression model. Mixed-effects models are extensions of linear regression models for data which are collected and summarized in groups. It is mainly used to determine the expected or mean values of the subject population. She believes, “you need to do a little bit more hands-on stuff, you need to do your validation, you probably need to do some additional data cleaning,” however, it only takes some time to do a lot of computing in less money and data.

Want to know more about Regression?

With so many benefits in regression-based models, you should definitely give Regression models a try. Read our book ‘Python Machine Learning By Example’ written by Yuxi (Hayden) Liu, to learn about regression algorithms and their evaluation. You can also master the art of building your own machine learning systems using other models such as Support Vector Machines and Text Analysis Algorithms with this example-based practical guide.

The user-friendliest: Tree-based models

Tree based models works similar to a decision tree. It checks each node for a feature and depending on the value of that feature, the user can decide the path to be followed. When going down a particular path, it again checks for nodes with feature. In this way, it works recursively to cut down a decision region into smaller chunks. Tatman also notified that developers generally opt for a forests model, instead of a tree based model. A random forest is an ensemble model which combines many different decision trees together into a single model.

Per Tatman, “If you’re in the machine learning community you might actually associate random forests with kaggle and from 2010 to 2016, about two-thirds of all kaggle competition winners used random forests.” On the other hand, “less than half use some form of deep learning, also random forests continue to do very well today.”

In case of classification of data, random forests deliver better performance than logistic regression. It also does not need a lot of data cleaning or model validation. Random forests also does not require a user to convert the categorical variables, it simply undertakes the values and provides a corresponding output. It also supports many easy to use packages like XG boost, LightGBM, CatBoost, and others. In short, regression trees are the most user friendly model especially when doing classification.

The drawbacks of trees/random forests are that they can easily overfit, it is also more sensitive to differences between datasets. It is less interpretable and requires more compute and training time when compared to regression models. Thus, tree based models require little money, but do need some data and time to train big data sets.

The most lightweight: Distance-based models

In the final type, Tatman has used a common notation to group together a large group of methods like K-nearest neighbors, Gaussian Mixture models and Support Vector machine. These models work with the basic idea that “points closer together to each other in a particular feature space are more likely to be in the same group.”

The K-nearest neighbors model decides the value of a point based on the nearest majority neighbors.The Gaussian mixture models utilizes any distribution of distribution points that are a mixture of different gaussians. The support vector model tries to be as far away from all the data points as possible.

Distance based models, particularly support vector models works very well with small data sets. They also tend to train 10 times faster than a regression model on the same data. In terms of accuracy, distance based models lags behind other models, but in case of quick and dirty modeling, they perform better. They are good at data classification, but are a little slower when compared to regression based models. Consequently, distance based models takes very little time, requires very little money and are extremely lightweight.

To conclude, Tatman says that the choice of one’s model should depend on the kind of time and money, the individual or organization possesses. Also, the most vital point to choose a model depends on its performance. Tatman adds, “based on empirical evidence right now it looks like deep learning will perform the best on a given data set given sufficient time money and compute.” Watch Tatman’s full talk for a detailed comparison of the three models.

You can learn more about all the above machine learning models from our book, ‘Python Machine Learning By Example’ written by Yuxi (Hayden) Liu. The book will help you in implementing machine learning classification and regression algorithms from scratch in Python. Also, learn how to optimize the performance of a machine learning model for your application from our book.

Kaggle's Rachel Tatman on what to do when applying deep learning is overkill

The most interpretable: Regression-based models

The user-friendliest: Tree-based models

The most lightweight: Distance-based models

Leave a Reply Cancel reply