Subscribe to DSC Newsletter

Which metrics should an author / website attribution algorithm use? In recent tests, I've posted original content on website A, then re-posted on website B (two different domain names), yet when you did a search involving 3- or 4-token keyword extracts from my article, website B showed up first in Google search results, and website A either did not show up at all, or in search result page #5 or #6.

Of course, if the time lag between both posts is too small (< 24 hours), Google has no way to known where the article was first posted. Also if sites B has a much better score or page rank than A, it will favors B regardless of  attribution / copyright. Misattribution can also be caused by inefficiencies in the keyword / landing page matching algorithm (AKA Google's keyword relevancy algorithm). More on optimizing keyword relevancy later (I will explain in a future post how to build a system to assign a user intent to a raw user keyword, then map the raw user keyword with user intent - or meta keyword, or keyword cluster - to provide more relevant search results).

Also, how does Google handle curated re-posts: that is a re-posted article where only the first paragraph is re-posted, and augmented with original comments and other references?

Views: 1065

Reply to This

Replies to This Discussion

It is possible to ask the website record what time a new content is posted?

Google allow trusted publishers to post a time stamp and follow some protocol. If accepted by Google as a trusted publisher, your content even gets featured in Google news. I don't remember the details, but do a search on "how to be published on Google news", and you'll find how it works.

RSS

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service