**Logistic regression (LR)** models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for *all* situations. For example:

- In propensity score analysis where there are many covariates, LR performs poorly.
- For classifications, LR usually requires more variables than to achieve the same (or better) misclassification rate than Support Vector Machines (SVM) for multivariate and mixture distributions.

In addition, LR is prone to issues like overfitting and multicollinearity.

A **wide range of alternatives** are available, from statistics-based procedures (e.g. log binomial, ordinary or modified Poisson regression and Cox regression) to those rooted more deeply in data science such as machine learning and neural network theory. Which one you choose depends largely on what tools you have available to you, what theory (e.g. statistics vs. neural networks) you want to work with, and what you're trying to achieve with your data. For example, tree-based methods are a good alternative for assessing risk factors, while Neural Networks (NN) and Support Vector Machines (SVM) work well for propensity score estimation and Categorization/Classification.

There are literally hundreds of viable alternatives to logistic regression, so it isn't possible to discuss them all within the confines of a single blog post. What follows is an outline of some of the more popular choices.

- Tree-Based Methods
- Neural Networks and Support Vector Machines
- K-Nearest Neighbor
- Traditional Statistical Methods

In machine-learning, perhaps the best known tree-based methods are AQ11 and ID3, which automatically generate trees from data. Classification And Regression Tree (CART) is perhaps the best well known in the statistics community. All of these tree-based methods work by recursively partitioning the sample space, which--put simply--creates a space that resembles a tree with branches and leaves.

For identifying risk factors, tree-based methods such as CART and conditional inference tree analysis may outperform logistic regression. The key difference between LR and tree-based methods is that while logistic regression makes assumptions about the underlying data structure, tree-based methods have no such assumptions. Another important difference is *how* the models identify risk factors: logistic regression derives odds ratios for significant factors, while tree-based methods use tree-splitting ("ramifications") to represent the risk factors; A probability of occurrence is assigned to each end of branch in the tree.

As far as overall performance, there are some important differences. Nagy (2009) found that the trees outperformed logistic regression by identifying more risk factors and by correctly classifying items (which were horses in the author's study). However, Nagy found "No difference...between the two tree-based methods regarding the structure and prediction accuracy of the trees."

Tree-based methods may outperform LR when it comes to **classification**, but they are *more* prone to overfitting than LR. This can be combated by "pruning" the tree. Another option is to try both LR and a decision tree to see which gives you the most desirable results.

Logistic regression is commonly used for **Propensity Score (PS) analysis**, but there are some cases where LR doesn't work well. These circumstances include models that have many covariates and response surfaces that aren't hyperplanes. Neural Networks (NN) and Support Vector Machines (SVM) are good alternatives, providing more stable estimates in most cases, although NNs tend to outperform SVMs. Keller et al. (2013) recommends estimating propensity scores with both LR *and* NNs. If a better balance is achieved with NNs, you then have an opportunity to re-specify the LR model or use the estimates provided by the NNs.

As mentioned above, tree-based methods tend to outperform LR when it comes to **classification**. However, SVMs are gaining popularity as an alternative. SVMs combine computer algorithms and theoretical results, which has resulted in a good reputation for classification purposes. Several authors (as cited in Salazar et al., 2012) found SVMs outperformed LR in several key areas, including the fact that—for multivariate and mixture distributions—SVM requires fewer variables than LR to achieve the same (or better) misclassification rate (MCR). Neural networks also perform well, especially if you have sparse binary data.

It's important to note though, that SVMs and NNs aren't a "miracle" alternative to LR; While some studies report the superiority of one method, other studies are often in direct contradiction. Several factors must be taken into account when deciding to switch methods, including your comfort level, your area of expertise, and specifics about your data. For example, If you are only using a single variable to classify new observations, SVM is a good alternative. However, the polynomial SVM is not recommended for this purpose because it produces a higher misclassification rate. SVM is also likely to perform better than LR if you have high correlation structures in your data.

Widely available in statistics and data mining packages, **K-nearest neighbor (KNN)** is a simple, instance based learning (IBL) program. As it's such a simple program to implement, it's often a first choice for classification; As well as being easy, it usually gives results that are good enough for many applications. It was originally developed by Fix & Hodges (as cited in Kirk, 2014), whose work focused on classifications with unknown distributions.

KNN performs well in many situations, and for classifications is often the "outright winner" (Bichler et al., 2004). For ease to interpret output, calculation time, and predictive power, Srivastava (2018) reports that LR and KNN are practically identical.

One of the major problems with KNN is choosing a value for "k", which can seem quite arbitrary. Many methods exist for choosing k, including guess and check (which is exactly as it sounds...you guess, and then check) and a multitude of algorithms that optimize k for any given training set. Kirk (2004) provides a great overview of the "choosing k" problem (pp. 25-29); For more detail about algorithms, he recommends Florian Nigsch et al,'s article Melting Point Prediction Employing K-Nearest Neighbor Algorithms an....

Traditional statistical methods are time tested and shouldn't be overlooked in favor of ML algorithms or Neural networks just for the sake of appearing "up to date". In some cases, **traditional methods outperform even the most tried and trusted modern algorithms**. For example, in *Comparing Classification Methods for Campaign Management*, Bichler et al. concluded that "...**stepwise logistic regression** performed best and dominated all other methods."

**Discriminant analysis** is a very popular longstanding tool for classification. In a practical sense, there are very minor differences between discriminant analysis and logistic regression (Michie et al. 1994, as cited in Bichler et al., 2004). In fact, LR and linear discrimination are identical for normally distributed data that have equal covariances and for independent binary attributes (Bichler, 2004).

Other, notable statistics-based alternatives:

**Log-Binomial regression**: The log-binomial naturally approximates the binomial distribution (which is the underlying mechanism for LR), but can end up with convergence problems.**Poisson regression**: Good for large sample sizes, but may estimate probabilities greater than 1. Also tends to provide conservative estimates for confidence intervals.**Poisson with robust variance estimator (modified Poisson)**: Good for large sample sizes, but may estimate probabilities greater than 1.**Cox regression:**while this is a good alternative, although it doesn't estimate probabilities.

Bichler, M. & Kiss, C. (2004). A Comparison of Logistic Regression, k-Nearest Neighbor, and Decisi.... AMCIS 2004 Proceedings.

Bryan S. B. Keller , Jee-Seon Kim & Peter M. Steiner (2013) Abstract: Data Mining Alternatives to Logistic Regression for Prope..., 48:1, 164-164, DOI: 10.1080/00273171.2013.752263

Kirk, M. Thoughtful Machine Learning: A Test-Driven Approach. O'Reilly Media.

Nagy, K. Chapter 3. Tree-based methods as an alternative to logisti...

Nigsch, F. et al. (2006). Melting Point Prediction Employing K-Nearest Neighbor Algorithms an.... Journal of Chemical Information Modeling. 46 (6), pp 2412–2422

Salazar, D. et al.(2012). Comparison between SVM and Logistic Regression: Which One is Better... Revista Colombiana de Estadística Número especial en Bioestadística

Junio 2012, volumen 35, no. 2, pp. 223 a 237.

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- How to Accelerate and Scale Your Data Science Workflows - June 11

Data scientists are faced with requests for information on a regular basis—from our colleagues, our bosses, or our clients. At the beginning of the analytic process, it's rare that requests are made or organized in a way that makes them easy to fulfill. It is our job to systematically unpack these requests and organize them in a way that allows us to take action and build analyses that accurately address the initial request—and to do it quickly, with the least amount of back-and-forth possible. In this latest DSC webinar, learn practical techniques to optimize your workflow, as well as the logic behind this methodology. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- How to Accelerate and Scale Your Data Science Workflows - June 11

Data scientists are faced with requests for information on a regular basis—from our colleagues, our bosses, or our clients. At the beginning of the analytic process, it's rare that requests are made or organized in a way that makes them easy to fulfill. It is our job to systematically unpack these requests and organize them in a way that allows us to take action and build analyses that accurately address the initial request—and to do it quickly, with the least amount of back-and-forth possible. In this latest DSC webinar, learn practical techniques to optimize your workflow, as well as the logic behind this methodology. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central