Subscribe to DSC Newsletter

While looking at the 2000+ most popular live articles on DSC, based on Google Analytics numbers, we found an interesting pattern.

The bottom 50% of these articles, those with pageview counts between 145 and 500 (so the least popular of the most popular group), have gaps in the pageview distribution, see below. The gaps is also widespread for the other half - the top 50%.

Our intern Livan is investigating this, as it is part of our new growth hacking strategy to be deployed soon (details to be published soon; the project involves categorizing popular articles, identifying time-sensitive articles such as events, and much more)

This study covered 2,000+ articles totaling more than 3 million page views, out of more than 40,000 live articles.

If you look at the above figure, 38 articles had 145 pageviews, 71 articles had 161 pageviews, but none had a pageview count between 145 and 161. Note that the gap in the distribution (161-145 = 16, 177-161 = 16) is always equal to 16. Any idea why this is happening?  

We've found other oddities in Google Analytics reporting, such as the fact that all sessions that only have one pageview, last zero second (see section 3 after clicking on the link, where we propose a solution to this issue).

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 4169

Reply to This

Replies to This Discussion

I've used GA since their Urchin days...and it's simply not something that can be relied upon as your sole analytics tool.

I've always run 2-4 independent analytics systems, preferably using different tracking mechanisms. Typically GA and StatCounter being my default systems.

When I've run REST-based systems with tracking internally integrated into the controller handling the REST calls (meaning I know EXACTLY what's been served up and to whom), I've found both systems to have missing data...with GA being a significantly worse offender.

To really use the techniques that people now called "growth hacking" (a sketchy term for what many of us have been doing since the late 90's when the enabling technology first emerged) you need to have individualized page-by-page tracking with each visitor being uniquely tagged and tracked (even if they've not yet gone through an identity-event). That way you can do long-term longitudinal tracking at an individual level...and you can aggregate up from the raw data in whatever ways the data suggests.

GA is simply not the tool for analytics at that level of granularity!

Dale, here's more about GA discrepancies. Thanks for your contribution!

Here's what Livan (our intern) told me: On the other side, I checked page views for top 5000 articles during last month (recent data) and plots do not have 16 gap.

So the issue occurs only when you request large reports.

GA seems to have an issue with certain race conditions. If a new resource request comes too quickly...or if a request consumes too many server resources, then sometimes the JS won't fully execute...hence no record.

If you need comprehensive coverage, a more certain approach is to use a non-JS based web-beacon system (still not 100% certain)...and hard to find as a 3rd-party solution.

My preferred approach is to build on top of a fully REST-ful architecture and having the REST Controller generate database log entries for each request. This is 100% accurate and can allow granularity below the level of a full page request (handling AJAX-based resource requests as well as page requests).

Unfortunately...the REST approach isn't an after-the-fact bolt on...the site has to be architected with that in mind from the beginning.

But it's absolutely THE approach for SaaS-based product sites where good product analytics can translate very directly into increased growth and revenue!!!



Vincent Granville said:

Here's what Livan (our intern) told me: On the other side, I checked page views for top 5000 articles during last month (recent data) and plots do not have 16 gap.

So the issue occurs only when you request large reports.

This site is also using the legacy tracking code and the universal tracking code for two different accounts. The multi account tracking is very reasonable in some situations but using both tracking scripts must have conflicts.

I'd highly suggest using one js script and push to both accounts. 

If you think GA is not as precise as you would like it to be. Look into setting up Piwik which will give you control over just about everything and access to the raw data stored in your own mysql db. 

RSS

Videos

  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service

console.log("HostName");