Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practising Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them.1. Principal Component Analysis(PCA)/SVDPCA is an unsupervised method to understand the global properties of a dataset consisting of vectors. Covariance Matrix of data points is analyzed here to understand what dimensions(mostly)/ data points (sometimes) are more important (ie have high variance amongst themselves, but low covariance with others). One way to think of top PCs of a matrix is to think of its eigenvectors with the highest eigenvalues. SVD is essentially a way to calculate ordered components too, but you don’t need to get the covariance matrix of points to get it.This Algorithm helps one fight the curse of dimensionality by getting datapoints with reduced dimensions.Libraries:https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.htmlhttp://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.htmlIntroductory Tutorial:https://arxiv.org/pdf/1404.1100.pdf2a. Least Squares and Polynomial FittingRemember your Numerical Analysis code in college, where you used to fit lines and curves to points to get an equation. You can use them to fit curves in Machine Learning for very small datasets with low dimensions. (For large data or datasets with many dimensions, you might just end up terribly overfitting, so don’t bother). OLS has a closed-form solution, so you don’t need to use complex optimization techniques.As is obvious, use this algorithm to fit simple curves/regressionLibraries:https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.htmlhttps://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.polyfit.htmlIntroductory Tutorial:https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/linear_regression.pdf2b. Constrained Linear RegressionLeast Squares can get confused with outliers, spurious fields and noise in data. We thus need constraints to decrease the variance of the line we fit on a dataset. The right method to do it is to fit a linear regression model which will ensure that the weights do not misbehave. Models can have L1 norm (LASSO) or L2 (Ridge Regression) or both (elastic regression). Mean Squared Loss is optimized. Use these algorithms to fit regression lines with constraints, avoiding overfitting and masking noise dimensions from the model.Libraries:http://scikit-learn.org/stable/modules/linear_model.htmlIntroductory Tutorial(s):https://www.youtube.com/watch?v=5asL5Eq2x0Ahttps://www.youtube.com/watch?v=jbwSCwoT51M3. K means ClusteringEveryone’s favourite unsupervised clustering algorithm. Given a set of data points in the form of vectors, we can make clusters of points based on distances between them. It’s an Expectation-Maximization algorithm that iteratively moves the centres of clusters and then clubs points with each cluster centres. The input the algorithm has taken is the number of clusters which are to be generated and the number of iterations in which it will try to converge clusters. As is obvious from the name, you can use this algorithm to create K clusters in a dataset.Library: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.htmlIntroductory Tutorial(s):https://www.youtube.com/watch?v=hDmNF9JG3lohttps://www.datascience.com/blog/k-means-clustering4. Logistic RegressionLogistic Regression is constrained Linear Regression with a nonlinearity (sigmoid function is used mostly or you can use tanh too) application after weights are applied, hence restricting the outputs close to +/- classes (which is 1 and 0 in case of sigmoid). Cross-Entropy Loss functions are optimized using Gradient Descent. A note to beginners: Logistic Regression is used for classification, not regression. You can also think of Logistic regression as a one layered Neural Network. Logistic Regression is trained using optimization methods like Gradient Descent or L-BFGS. NLP people will often use it with the name of Maximum Entropy Classifier.This is what a Sigmoid looks like:Use LR to train simple, but very robust classifiers.Library:http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.htmlIntroductory Tutorial(s):https://www.youtube.com/watch?v=-la3q9d7AKQ5. SVM (Support Vector Machines)SVMs are linear models like Linear/ Logistic Regression, the difference is that they have different margin-based loss function (The derivation of Support Vectors is one of the most beautiful mathematical results I have seen along with eigenvalue calculation). You can optimize the loss function using optimization methods like L-BFGS or even SGD.Another innovation in SVMs is the usage of kernels on data to feature engineer. If you have good domain insight, you can replace the good-old RBF kernel with smarter ones and profit.One unique thing that SVMs can do is learn one-class classifiers.SVMs can be used to train a classifier (even regressors)Library:http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.htmlIntroductory Tutorial(s):https://www.youtube.com/watch?v=eHsErlPJWUUNote: SGD based training of both Logistic Regression and SVMs are found in SKLearn’s http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html, which I often use as it lets me check both LR and SVM with a common interface. You can also train it on >RAM sized datasets using mini-batches.6. Feedforward Neural NetworksThese are basically multilayered Logistic Regression classifiers. Many layers of weights separated by non-linearities (sigmoid, tanh, relu + softmax and the cool new selu). Another popular name for them is Multi-Layered Perceptrons. FFNNs can be used for classification and unsupervised feature learning as autoencoders. Multi-Layered perceptron FFNN as an autoencoder FFNNs can be used to train a classifier or extract features as autoencodersLibraries:http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifierhttp://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.htmlhttps://github.com/keras-team/keras/blob/master/examples/reuters_mlp_relu_vs_selu.py Introductory Tutorial(s):http://www.deeplearningbook.org/contents/mlp.htmlhttp://www.deeplearningbook.org/contents/autoencoders.htmlhttp://www.deeplearningbook.org/contents/representation.html7. Convolutional Neural Networks (Convnets)Almost any state of the art Vision-based Machine Learning result in the world today has been achieved using Convolutional Neural Networks. They can be used for Image classification, Object Detection or even segmentation of images. Invented by Yann Lecun in late 80s-early 90s, Convnets feature convolutional layers which act as hierarchical feature extractors. You can use them in text too (and even graphs).Use convnets for state of the art image and text classification, object detection, image segmentation.Libraries:https://developer.nvidia.com/digitshttps://github.com/kuangliu/torchcvhttps://github.com/chainer/chainercvhttps://keras.io/applications/Introductory Tutorial(s):http://cs231n.github.io/https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/8. Recurrent Neural Networks (RNNs):RNNs model sequences by applying the same set of weights recursively on the aggregator state at a time t and input at a time t (Given a sequence has inputs at times 0..t..T, and have a hidden state at each time t which is output from t-1 step of RNN). Pure RNNs are rarely used now but its counterparts like LSTMs and GRUs are state of the art in most sequence modelling tasks.RNN (If here is a densely connected unit and a nonlinearity, nowadays f is generally LSTMs or GRUs ). LSTM unit which is used instead of a plain dense layer in a pure RNN.Use RNNs for any sequence modelling task especially text classification, machine translation, language modellingLibrary:https://github.com/tensorflow/models (Many cool NLP research papers from Google are here)https://github.com/wabyking/TextClassificationBenchmarkhttp://opennmt.net/ Introductory Tutorial(s):http://cs224d.stanford.edu/http://www.wildml.com/category/neural-networks/recurrent-neural-networks/http://colah.github.io/posts/2015-08-Understanding-LSTMs/9. Conditional Random Fields (CRFs)CRFs are probably the most frequently used models from the family of Probabilitic Graphical Models (PGMs). They are used for sequence modelling like RNNs and can be used in combination with RNNs too. Before Neural Machine Translation systems came in CRFs were the state of the art and in many sequence tagging tasks with small datasets, they will still learn better than RNNs which require a larger amount of data to generalize. They can also be used in other structured prediction tasks like Image Segmentation etc. CRF models each element of the sequence (say a sentence) such that neighbours affect a label of a component in a sequence instead of all labels being independent of each other.Use CRFs to tag sequences (in Text, Image, Time Series, DNA etc.)Library:https://sklearn-crfsuite.readthedocs.io/en/latest/Introductory Tutorial(s):http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/7 part lecture series by Hugo Larochelle on Youtube: https://www.youtube.com/watch?v=GF3iSJkgPbA10. Decision TreesLet’s say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question “Which fruits are red and round ?” and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won’t be red and round. So I will ask a question “Which fruits have red or yellow colour hints on them? ” on red and round fruits and will ask “Which fruits are green and round ?” on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition. Intuition cannot work on high dimensional and complex data. We have to come up with the cascade of questions automatically by looking at tagged data. That is what Machine Learning based decision trees do. Earlier versions like CART trees were once used for simple data, but with a bigger and larger dataset, the bias-variance tradeoff needs to be solved with better algorithms. The two common decision trees algorithms used nowadays are Random Forests (which build different classifiers on a random subset of attributes and combine them for output) and Boosting Trees (which train a cascade of trees one on top of others, correcting the mistakes of ones below them).Decision Trees can be used to classify data points (and even regression)Librarieshttp://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.htmlhttp://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.htmlhttp://xgboost.readthedocs.io/en/latest/https://catboost.yandex/Introductory Tutorial:http://xgboost.readthedocs.io/en/latest/model.htmlhttps://arxiv.org/abs/1511.05741https://arxiv.org/abs/1407.7502http://education.parrotprediction.teachable.com/p/practical-xgboost-in-pythonTD Algorithms (Good To Have)If you are still wondering how can any of the above methods solve tasks like defeating Go world champion like DeepMind did, they cannot. All the 10 type of algorithms we talked about before this was Pattern Recognition, not strategy learners. To learn strategies to solve a multi-step problem like winning a game of chess or playing Atari console, we need to let an agent-free in the world and learn from the rewards/penalties it faces. This type of Machine Learning is called Reinforcement Learning. A lot (not all) of recent successes in the field is a result of combining perception abilities of a Convnet or LSTM to a set of algorithms called Temporal Difference Learning. These include Q-Learning, SARSA and some other variants. These algorithms are a smart play on Bellman’s equations to get a loss function that can be trained with rewards an agent gets from the environment.These algorithms are used to automatically play games mostly :D, also other applications in language generation and object detection.Libraries:https://github.com/keras-rl/keras-rlhttps://github.com/tensorflow/minigoIntroductory Tutorial(s):Grab the free Sutton and Barto book: https://web2.qatar.cmu.edu/~gdicaro/15381/additional/SuttonBarto-RL-5Nov17.pdf Watch David Silver course: https://www.youtube.com/watch?v=2pWv7GOvuf0These are the 10 machine learning algorithms which you can learn to become a data scientist.You can also read about machine learning libraries here.We hope you liked the article. You can check demo’s of ParallelDots AI APIs here. See More

]]>

“The thinking in AI has changed from ‘What’s possible?’ to ‘How do I do this?’” explains Rafiq Ajani at McKinsey educational AI forum. Natural Language Processing (NLP), an important subfield of AI that deals with how to program computers to process and analyze large amounts of natural language data, similarly is no longer a “nice to have”, but a “must have” technology.Companies using Natural Language Processing are already seeing the business impact from improved customer experience to business growth. An organization that commits to NLP can enjoy the benefits of a shared understanding of data and goals, improved decision-making, fact-based analysis that avoids guesswork and allows for refined planning and forecasting at every level of the organization.But as important as it is to use NLP for automatic business process, so is the decision of choosing the right NLP vendor if you have decided to outsource the NLP development. Generally, it is cheaper to outsource the development of AI than to build it in-house. However, there are certain things you need to keep in mind when choosing an NLP partner to ensure your business interests are not compromised.In this blog, we look at some of the key questions you need to ask before choosing an NLP vendor for your organization.1. What is my business use-case?Does your business want to use NLP to reduce costs by automating a process or gain insights from unstructured data? A good business objective motivates every new investment. Having a clear understanding of your use-case will avoid frustration in the long run and set the KPIs for the project at the outset.For instance, you may decide that you need an NLP solution to automatically triage your customer support requests, thus freeing up the time for your agent to work on other complex tasks. For such projects, it is important to work with an NLP partner who can help you set the KPIs for this project based on the amount and type of data available and also account for your unique business requirement. The partner should then build a solution that can achieve those KPIs.2. How accurate is the NLP solution?Accurate data analysis is the key to making informed business decisions, especially in the case of unstructured or open-ended texts. Hence, before choosing an NLP vendor, it is important to know how accurate is the current solution to your data. A lot of off-the-shelf NLP solutions may not work on your data with very high accuracy. Lower accuracy may directly affect your business objective. In addition to its immediate impact on the business objective, it can also create a disconnect between what is believed to be happening and the reality.3. Can the solution be customized according to my needs?There are almost no plug-and-play solutions in NLP—NLP architectures need to adapt to your unique data and comply with your business regulations. Extending the previous point, if the standard NLP solution does not perform well on your data with high accuracy it is important to ask the vendor if they can customize their model to perform well on your data.A good vendor should be able to fine-tune their model on your data without having to build it from scratch. They should be able to properly understand your needs and requirements and adapt to the challenges. This would ensure that you have the most appropriate solutions in every environment. They should also allow extensive customization as per your KPI demands, regardless of how complex they may be. 4. What amount of training data would be required?It is essential to know the requirement of training data if your vendor proposes to customize their algorithm on your data. While more the data, better is the accuracy of the AI algorithm, building a large corpus of annotated data is a task in itself. It can quickly become very expensive and can stall your project. Your vendor should be able to build a good model using a lesser amount of data. Also, there should be a self-learning loop in the way AI is deployed in your organization such that it improves from the human input.5. Does the model improve on continuous usage?No NLP solution will perform at 99% accuracy from the start. Therefore, it is important to ask your vendor if the model improves from continuous usage. An ideal vendor should be able to deploy their solution in a manner that it learns from human feedback, if available (human-in-the-loop).InData Lab explains this setup very well in their blogs – “humans step in when algorithms are not up to the task. When the machine isn’t sure what the answer is, it relies on a human, then adds the human’s judgment to the model. This way the algorithm learns faster and the need for future human intervention is reduced.”For improved accuracy and optimum results, the NLP solution must improve as more and more data is fed to it from human input.6. Is the NLP solution affordable at scale?Any NLP solution with usage-based pricing should be affordable at scale. The solution itself should be scalable so it can be rolled across your entire organization. Your vendor should be able to provide a solution that can be adapted to and applied in every environment as per the requirement. It should be able to maintain uniformity and consistency throughout and bring out meaningful results without breaking the bank.7. Do they provide an on-premise solution?NLP solutions need data to process which sometime may contain sensitive information, especially in financial services and healthcare industry. Your vendor should be able to propose a solution to deploy their NLP solution on your private cloud or on-premise. For such cases, their solution should be compatible to perform optimally in your infrastructure without causing many headaches to your IT team.ParallelDots NLP APIs are specifically designed to fit into your existing security architecture. It ensures that all your data remains safely behind your firewall and within your security controls.8. What are its integration capabilities?You should choose a vendor that offers comprehensive integration capabilities, including end-to-end integration with your CRM, Customer Support, and Business Intelligence applications. A well-integrated application improves the workflow of the end users and further enhances their productivity.You can try our Text Analysis API demo here. You can also find out a list of free resources to learn NLP here. Read our other blog to learn how NLP is automating text analysis processes for enterprises.See More