Subscribe to DSC Newsletter

Many unscrupulous bloggers re-post copyrighted material on their blogs, without permission. The problem is compounded by the fact that Google can give credit to the illegal version, and erroneously penalize the original version as plagiarism, despite the fact that time-stamps make it obvious to find out which version is original. The problem is described here.

One easy way to find out whether your articles are re-posted without permission is to do a Google search for the title or for key sentences from your article. However, smart plagiarists will change the title and scramble the content.

Here is an easy trick to find them:

Put blank, invisible, one by one pixel, tracking images randomly in your article. In most cases, the plagiarist do a copy and paste of your article (or parts of it), and thus those tracking images will also be posted (and invisible) on his blog, along with the stolen content. Then by looking at your web log statistics, you will be able to find the source of plagiarism, assuming that those tracking images are hosted on a server that you monitor.

Have anyone successfully used this strategy? What are its drawbacks?

Views: 2421

Reply to This

Replies to This Discussion

Hi Vincent,

One problem I see is that any mildly sophisticated plagiarist will easily spot and strip any image anchors found in the raw html code.  I guess it depends on how they are plagiarizing the material...

Kenneth Graves

Hi Kenneth,

Much of the plagiarism that I see is still somewhat rudimentary. Of course it could just be the tip of the iceberg, with tons of professional plagiarism that I don't even see.

Vincent

This is made easier by the fact that "invisible" graphics will stick out in a text based browser like a sore thumb (back when I was webmastering, the page wasn't done until it looked good in Lynx).  The trick will catch stupid crooks, but not smart ones.

Kenneth D. Graves said:

Hi Vincent,

One problem I see is that any mildly sophisticated plagiarist will easily spot and strip any image anchors found in the raw html code.  I guess it depends on how they are plagiarizing the material...

Kenneth Graves

I think I'd post a master copy to an internal URL not normally accessible to non-administrators, with a good provenance audit trail.  Then I would replace one or two more or less innocuous lines in the document with a programmed element that would present a very slightly modified version, while capturing, as best possible, an IP and timestamp, for presentation in the displayed text.  

Then not only would one have documentation of the originality, but also something like a general version of a 'canary trap' which could be used to demonstrate expropriation and possibly even 'fingerprint' the thief.  It would be too general to always precisely ID a party, but at least it could really narrow it down at worst. 

Hi Vincent,

I can recommend a few successful approaches:

1. Intentionally mis-spelling a word, or transposing two characters in an obvious command, e.g.: sodu instead of sudo, exce instead of exec. However, this can reflect on your proof reading abilities.

2. Commands are useful in another way, as they invariably contain other words following them, which you can name uniquely, e.g.: Name a file after your birthdate, output170479.txt. In combination, a search for the two should quickly identify any plagiarists. Trickier to spot even for the technically competent plagiarist, and no reflection on your proof reading skills.

3. Put an unusual, rarely used word, or better still phrase, into a comment line. Google it (in double quotes) first, to check there's not millions of people using the same phrase.

For any lazy plagiarists, which I'm guessing covers 99.9% of them, they're either not going to go to the time consuming lengths of searching for and changing anything that looks like it might snag them, or lack the technical expertise to verify that your code is 100% correct. I've caught a couple of people in the SQL Server world on "reputable" sites by using this technique in my own articles, it does work. Just put a reminder every six months in your calendar for what to search for, and change phrases occasionally.

The lazy are easy to hunt down. It's just a matter of getting inside their theivish heads, then thinking how they think (or rather, don't), and further enervating their weakness. ;-)

Spot the trio of intentional snags in that last sentence!

An approach borrowed from my early days of seeding of list management data with known seed names and contact details works well. This can be accomplished by placing traceable examples in your blog which are pertinent to you and your experience/history, when explaining the blog subject.

This worked for me when completing a blog a couple years ago on 'Does Volume mean Value for Big Data in Marketing?', where the article was not only copied but sold onto a 3rd party!

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service