Subscribe to DSC Newsletter

Fake traffic un-detected by Google Analytics

Or to put it differently, when your metrics lie to you: how to find out, and what should you do? 

The purpose of this article is to let Google aware of the problem, and fix their Google Analytics reports (filtering out the fake traffic). This scheme also impacts many companies computing website rankings. Tons of websites now have their traffic stats wrong, and should consider using home-made solutions, to filter out fake traffic undetected by vendors. 

I used just a tiny bit of data science and tiny data (from Google Analytics real time stats - number of users, top locations, pages visited) to unearth the following:

  • The fake traffic probably started in South America, weeks ago, impacting Alexa web traffic measurements
  • It impacted 5% of the specialized web domains that I closely follow
  • We noticed it today and yesterday in our Google Analytics report (and earlier in our Alexa metrics, though it was not US-based back then)

The following is an updated analysis as of October 22. 2014 at 11 pm PST, as the pattern has evolved since yesterday.

Here's the fingerprint, to detect this bogus traffic:

  • Takes place after 6 pm PST for now
  • Very short visits, many IP addresses used
  • Traffic spike lasts a few hours per targeted website
  • URL visits are done in pure alphabetical order (see picture below)
  • There's no referral domain
  • The fake traffic is 100% desktop
  • The fake traffic appears to be now 100% from US, from Spokane, WA
  • It might or might not be related to the Alexa metric attack described above (the attack in question could have been a test before deploying in US)

Evidently, these patterns will change over time.

What are they trying to achieve?

It could be

  • a disgruntled employee,
  • a company hurt by the terrible Alexa or other rankings, seeking revenge
  • a Google Analytics competitor (vendor) who wants to prove that its traffic reports are more accurate than Google or Alexa (since the perpetrator knows which traffic is real, which one is fake),
  • one of your competitors (playing with some traffic generation robot or using a rogue traffic generator vendor) trying to tell you that your traffic looks great and you can just relax and stop competing with them.
  • a marketing ploy to sell advertising by low-traffic websites, artificially boosting their traffic stats (though I don't understand why we would be hit, unless they want to make it more natural by faking even competitor traffic)
  • black hat SEO to dilute your user engagement metrics, to eventually get you penalized by Google

The scheme is (for now at least) extremely easy to detect. It will take a few days (my guess: less than one week, let's see) before Google and other companies find a fix.

Potential improvements for Google Analytics

Showing artificial traffic (robot) in a separate box or tab. I guess robot traffic is filtered out by Google, based on empirical evidence (see section 3 after clicking on the link), but it would be useful if Google Analytics reports this automated traffic separately. Note that the bot in question triggered the Google Analytics JavaScript tracking code, quite unusual, but nevertheless this bot was easy to detect, and should not have been counted in normal traffic. Empirical evidence suggests that this bot and other bots also inflate traffic stats from other websites, based on measurements from various web traffic measurement companies. 

DSC Resources

Additional Reading

Views: 7801

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Enexseenge on July 27, 2015 at 4:42pm

This is so interesting.

What ever happened?

Comment by Vincent Granville on October 28, 2014 at 7:11am

We are now on day #7, and the same fake traffic with Netscape user agent is showing up again in Google Analytics. We are going to look for a more accurate web analytics vendor, or maybe an home-made solution. This bot is so easy to detect, I don't understand why it is not filtered out.

Comment by Vincent Granville on October 23, 2014 at 9:48pm

This is now day #3 with the fake traffic. Still undetected by Google (though I'm going to contact them). I thought it would take less than one week for Google to fix its Google Analytics reports. Still 4 days to go before I lose my bet. By the way, the fake traffic started much earlier today (around 3 pm PST on 10/23) and later in the evening it moved from Spokane WA, to Ashburn, VA - where tons of Amazon EC2 servers are located. Also found originating from Boardman, OR.

If it's that easy to manufacture fake traffic (that is not detected by traditional filters), I guess we are going to see a lot more of this, and web traffic statistics will become meaningless. For those interested, the user agent generating this traffic is Netscape 4.5.

This is an example of metric corruption - a general class of nefarious actions aimed at making your KPIs show wrong values, usually for financial gain or competitive advantage. Another example of metric corruption is when a business hacker creates fake traffic to make your Google ads have a low engagement rate, to kick you out of AdWords, or make it far more expensive for you to stay on - draining your advertising budget. We tested it successfully a while back in our data science research lab. Similar techniques can be used to indirectly erase organic search results from Google's keyword-webpage value pair index.

Comment by David Hite on October 22, 2014 at 6:47pm

good call

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service