# Need Help Determining the Best Approach

I have a project I am working on and am not sure of the best approach. I am a student, still learning, so I thought I would post here and see what kind of feedback I get. This is not a school project... I am not cheating... this is a side project at work, dealing with conference registrations.

Here is an overview:

• Registrations are tracked for various conferences (same general conference, different locations).
• Problem: These conferences are free, so a lot of people register but do not attend.
• I want to create a model to estimate final attendance based on the following variables:
• Registration Days Out (for example, if the conference date is January 5th and the person registers on January 3rd, this value would be 2.
• Degree (NP, PA, RN, MD, other)
• Gender
• Miles from location (how far the person lives from the conference location)

I want to calculate the likelihood that a registrant will attend. For example, maybe a female PA who registers 30 days out is 60% likely to attend. Ultimately, I would like to create a report that updates an "estimated attendance" value every time someone registers.

My first thought was to do a multiple regression analysis and use the resulting equation to make this calculation. To do that, I would have to categorize the "Registration Days Out" variable (e.g. 80-90, 70-79, 60-69, etc...). I'm not sure if that is the best model to use though.

Any feedback/ideas/direction is appreciated.

Thanks!

Tags: Help, choice, model, regression

Views: 74