Subscribe to DSC Newsletter

I recently started working as data scientist and I have been assigned one project to work. They want to create prediction model which can predict numbers of incident tickets for each month. What are some of things which I should take under consideration before I start building model? Also, is there any study materials or case study which anybody recommend.  

Views: 189

Reply to This

Replies to This Discussion

Maybe start with the basics: number of calendar days in the month, number of business days in the month, number of tickets in the prior month.

Likely much depends on the nature of these tickets. Check for seasonal effects if applicable to your business or the mix of businesses served (e.g., a retail business may be busier in Q4, a tax firm might be busier around tax season, a school might be busier in the final weeks of the term). Cross reference with software release schedules if that is applicable to your business (if you're supporting a product, maybe tickets spike when updates are released or are muted when updates are held back). Cross reference with volume of clients or employees or students or whoever it is that submits IT tickets as submissions are lower if there are fewer clients to submit them and vice versa. Look at historical submissions, ask subject matter experts if they can explain any spikes or dips.

Sounds interesting--good luck!

Thank you Justin for providing really good guidance.

I have created simple model using excel by considering number of calendar days in the month, number of business days in the month, number of tickets in the prior month. 

Would I be able use location as one of the parameter for model? If so how should I convert string to int? 

I am working on application incident data. There is number of patches happening in every month as far as I can see there definitely spike incidents during those days. 

Interesting--wonder if you could get some kind of patch calendar or patch release schedule to help predict the spikes.

I suppose location could be tried, but I would wonder if location was serving more as a proxy for the real driver (e.g., number of people at one location being more than at another, hours worked by people at one location being more than people at another site, one location being open more days a week than another), and I would want to confirm that whatever it was about location that drove IT tickets in the past is something we would expect to continue to drive IT tickets in the future.

Never hurts to take a look, though. If the locations are just categories like "site 1", "site 2", and so forth, the easiest thing might be to code as dummy variables, meaning you have a "is site 1" bit, "is site 2" bit, and so on for each of the N categories (or perhaps N-1 depending on how you implement). Search for "modeling categorical data" for some ideas.

Good luck!



  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service