Subscribe to DSC Newsletter

Proposal for a new type of scoring system

In digital analytics, scoring Internet traffic is used to detect click fraud, and to find types of search keywords that convert well (to a sale). Quite often (for large ad networks) conversion data is poor or challenging: some clicks have a 0.2% conversion rate, some have a 30% - depending on the type of website, price, product, conversion type and other factors (even hour of the day has an impact).

One way to create a generic scoring system, to predict if a click is genuine or not, could rely on IP flags rather than conversion metrics. By IP flag, I mean IP blacklists such as Spamhaus, Barracuda or Adometry user IP and referral (web domain) blacklists, with various reason codes indicating why the IP's in question are blacklisted.

Since these 3rd party blacklists are the result of scoring system used by the vendors in question (Spamhaus, etc.) our generic score would be a score based on 3rd party scores, that is, a meta-score blending multiple scores - even blending scores that predict conversions, if possible.

In practice, I've found that

  • few data buckets with 6% of Spamhaus blacklisted IP addresses are extremely bad, so the entire traffic (with features corresponding to the bin in question) should be blocked
  • numerous large data buckets with 0% of Spamhaus blacklisted IP addresses, are clean and should not be blocked

Has such a strategy been used in other industries - finance or marketing? Of course the challenge is to identify data buckets either with very high or very low concentration of blacklisted IP addresses, using decision treesfeature selection and Internet topology mappings. And confidence intervals for the scores.

Views: 1490

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service