Subscribe to DSC Newsletter

Here I am talking about the Google algorithm that is used to attribute an article to the original blog where it was posted first, rather than to subsequent re-posts from authorized (syndicated) or non-authorized (plagiarist) sources. If instead, you are looking for lead or sales attribution modeling, click here

The problem that we face is as follows: we occasionally post authorized guest blogs, with a link at the bottom pointing to the original article - typically to the author's personal blog. Google's website attribution algorithm would benefit, if it finds in our guest blogs, some Google-approved tags that tell Google that our DSC article in question is not original, and should not be indexed. For instance something like a link that says "click here for the original version" with the URL being something like

john-smith-blog.com/data-science-article-by-john-smith.html?GoogleTag=DoNotIndexDSCversion

Of course, the tag DoNotIndexDSCversion would be encrypted - you would have to sign in on your Google account to get the correct encrypted tag - and the Google attribution algorithm would ignore this tag unless it is found on the initiating website, in this case, DSC. Maybe Google (or anyone else) could even patent this little trick.

What do you think about my solution? I think it is robust against black hat SEO: it can not be abused by fraudsters unlike the Google attribution algorithm, right? Or is there already a solution that I am not aware of?

The issue is that Google's attribution algorithm is terrible at matching an article with the original source where it was first published. In addition it is unable to recognize guest blogs, and the results are unpredictable, with unjustified Google penalties (if the algorithm erroneously thinks that we are plagiarists) or unjustified Google boosts in organic traffic (if the algorithm erroneously attributes the content to us).

Technical note: You can request in your robots.txt file on your web server to prevent some of your web pages to be indexed by search engines, or even block traffic from specific countries (by blocking some IP ranges). This technically solves the problem. However, this is a cumbersome process: you may not have access to robots.txt if you work with platforms such as Wordpress or Ning; and finally, editors and authors want a very simple solution like the one proposed in this article. On Ning, since you can choose the URL (path) attached to any article, it would be possible and very easy to add the Google tag DoNotIndexDSCversion into the DSC URL itself.

For more articles about Google, click here. For articles about this very specific issue and how to improve the Google attribution algorithms, click here

Views: 4210

Reply to This

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service