I am working on logistic regression.
in my dataset there are few variables like JobSatisficationLevel , having values in
range of 1 to 4 , with 1 being Least Satisfied and 4 being V.HighSatisfied
My Question is - should I leave this variable as numeric while applying logistic regression
should i convert it to Categorical variable with labels as 
Least  Medium  High V.High
then create dummy variables like 
is_JobSatisfactionLevelMedium, is_JobSatisfactionLevelHigh, is_JobSatisfactionLevelVHigh
then apply logistic regression
2. I have another variable as Salary, some emp have very high salary (they are like outlier w.r.t dataset)
what should be done for such employees whose salary is v.high
Which is correct approach please suggest

Tags: logistic, regression

Views: 478

Reply to This

Replies to This Discussion


Just wanted to understand more on what you want to achieve?what are you regressing with? Give bit of a context pls.

Its ideal to have continuous variables for the logistic regression than binary variable. If you create variables like is_JobSatisfactionLevelMedium, it will be boolean in nature but still can be treated. I prefer to have the continuous variables in nature.

For the salary option, have you  considered binning? You can be bit scientific on your approach and plot a histogram to see how the salary looks like in your data set and then consider binning accordingly?

I am quite new to data science and the approach what I suggested may be quite naive. 

If your Variable has Four Types, then you have to convert this variable into categorical.. for that,You don't have to label anything.

And, If your variable has outlier, then you should remove those outliers from the variable before applying the model.

Hope, this will help.


© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service