Subscribe to DSC Newsletter
I am working on logistic regression.
in my dataset there are few variables like JobSatisficationLevel , having values in
range of 1 to 4 , with 1 being Least Satisfied and 4 being V.HighSatisfied
My Question is - should I leave this variable as numeric while applying logistic regression
OR 
should i convert it to Categorical variable with labels as 
Least  Medium  High V.High
then create dummy variables like 
is_JobSatisfactionLevelMedium, is_JobSatisfactionLevelHigh, is_JobSatisfactionLevelVHigh
then apply logistic regression
2. I have another variable as Salary, some emp have very high salary (they are like outlier w.r.t dataset)
what should be done for such employees whose salary is v.high
Which is correct approach please suggest

Views: 276

Reply to This

Replies to This Discussion

Hi,

Just wanted to understand more on what you want to achieve?what are you regressing with? Give bit of a context pls.

Its ideal to have continuous variables for the logistic regression than binary variable. If you create variables like is_JobSatisfactionLevelMedium, it will be boolean in nature but still can be treated. I prefer to have the continuous variables in nature.

For the salary option, have you  considered binning? You can be bit scientific on your approach and plot a histogram to see how the salary looks like in your data set and then consider binning accordingly?

I am quite new to data science and the approach what I suggested may be quite naive. 

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service