In the customer management lifecycle, customer churn refers to a decision made by the customer about ending the business relationship. It is also referred as loss of clients or customers. Customer loyalty and customer churn always add up to 100%. If a firm has a 60% of loyalty rate, then their loss or churn rate of customers is 40%. As per 80/20 customer profitability rule, 20% of customers are generating 80% of revenue. So, it is very important to predict the users likely to churn from business relationship and the factors affecting the customer decisions. In this blog post, we are going to show how logistic regression model using R can be used to identify the customer churn in the telecom dataset.
Telecom dataset has the details for 7000+ unique customers, where details of each customer are represented in a unique row and below is the structure of the dataset: Input Variables: These variables are called as predictors or independent variables.
Output Variables: These variables are called as response or dependent variables. Since the output variable (Churn value) takes the binary form as “0” or “1”, it will be categorized under classification problem in the supervised machine learning.
From the model summary, the response churn variable is affected by tenure interval, contract period, paper billing, senior citizen, and multiple line variables. The importance of the variable will be identified by the legend of the correlated coefficients (*** – high importance, * – medium importance, and dot – next level of importance). Rerunning the model with these dependent variables will impact the model performance and accuracy.
To read original post & the code downloads, click here
Comment
Hi, I am not able to download the codes and see the model diagrams. Can any help me please.
Dead link :(
@Davide,
What Raghavan said is true in the context of a logistic regression model. Making a continuous variable into more similar "bins" helps the logistic regression algorithm pick out the riskier vs less risky bins. For example, if customers with low tenure and high tenure are high risk, but middle tenure are low risk, there's no way to model that relationship without cutting the variable into the 3 bins. This way the logistic regression can say each group has its own risk associated with it.
However, decision tree models do NOT typically benefit from discretizing the data's continuous features. In fact, this method typically makes the model worse - which is sometimes the price we pay for interpretability when using these types of models.
@Davide,
When we transform the continuous variable into meaningful factor variable it will help the business to identify the customers with which range of tenure interval people are churning from the business. In general tenure variable is affecting the model performance (churn prediction variable) when we use that as a continuous variable and as well as factor variable.
@Nommel, Please click the link at the bottom of the post that says "read original post". The CM is located there
Hey guys, I have been looking over and over and I can't find the confusion matrix.
@Liad,
Can you double check your calculation? I'm getting the following:
Accuracy: 80%
Precision: 66%
Recall: 52%
F1: 0.58 (scale 2)
Very interesting
Posted 29 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central