Predicting the chance that your tax return will be audited

What are your chances of being audited? And how can data science help answer this question, for each of us individually?

Some factors increase the chance of an audit, including:

  • High income
  • Being self-employed vs. being a partner in an LLC
  • Filing early
  • Having earnings both from W2's and self-employment
  • Business losses 4 years in a row
  • Proportion of your total deductions in travel and restaurant categories > 40%
  • Large donations
  • High home office deductions

Companies such as Deloitte or KPMG probably compute tax audit risks quite accurately (including the penalties in case of an audit) for their large clients, because they have access to large internal databases of audited and non-audited clients.

But for you and me, how could data science help predict our risk of an audit? Is there data publicly available to build a predictive model? Or is it a good idea for a new startup: creating a website where

  1. Users can anonymously answer questions about their tax return and previous audits (if any), or submit an anonymized version of their tax returns
  2. User data is gathered and analyzed, to give a real-time answer to users checking their audit risk
  3. As more people use the system, the better the predictions, the smaller the confidence intervals for estimated audit probabilities

Indeed, this would amount to reverse-engineering the IRS algorithms.

Which metrics should be used to assess tax audit risk? And how accurate can the prediction be? Could such a system attains 99.5% accuracy, that is, wrong predictions for only 0.5% of taxpayers? Right now, if you tell someone "you won't be audited this year", you are correct 98% of the time. More interestingly, what level of accuracy can be achieved for higher risk taxpayers?

Finally, if your model predict both the risk of audit, and the penalty if audited, then you can make smarter decisions, and decide which risks you can take and what to avoid. This is pure decision science to recoup dollars from the IRS, not via exploiting tax law loopholes (dangerous), but via honest, fair and smart analytics: in a nutshell outsmarting the IRS data science algorithms, rather than outsmarting tax laws.

Related articles:

Views: 3063


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service