Subscribe to DSC Newsletter

I have a project I am working on and am not sure of the best approach. I am a student, still learning, so I thought I would post here and see what kind of feedback I get. This is not a school project... I am not cheating... this is a side project at work, dealing with conference registrations. 

Here is an overview:

  • Registrations are tracked for various conferences (same general conference, different locations).
  • Problem: These conferences are free, so a lot of people register but do not attend. 
  • I want to create a model to estimate final attendance based on the following variables:
    • Registration Days Out (for example, if the conference date is January 5th and the person registers on January 3rd, this value would be 2.
    • Degree (NP, PA, RN, MD, other)
    • Gender
    • Miles from location (how far the person lives from the conference location)

I want to calculate the likelihood that a registrant will attend. For example, maybe a female PA who registers 30 days out is 60% likely to attend. Ultimately, I would like to create a report that updates an "estimated attendance" value every time someone registers. 

My first thought was to do a multiple regression analysis and use the resulting equation to make this calculation. To do that, I would have to categorize the "Registration Days Out" variable (e.g. 80-90, 70-79, 60-69, etc...). I'm not sure if that is the best model to use though. 

Any feedback/ideas/direction is appreciated.


Tags: Help, choice, model, regression

Views: 74

Reply to This


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service