As we have been considering web scraping for positive use, there is also the aspect of the negative use of scraping for the purpose of stealing other bloggers’ proprietary content. Let’s consider some anti web scraping WP plugins.
As for a web content ownership the main indicator here is the indexing done mainly by Google. This means that if the content is scraped and immediately reposted, Google might be fooled to index it as the original, while the genuine source will be counted as content farming. Higher ranking sites might have better chances of being indexed earlier than sites with the original content, and the latter might even get a mark for being spam. This is not necessarily a tendency, but in the past some precedents have happened. This seems ridiculous, but through a published feed the offenders might detect and quickly scrape the original content for repost.
We consider several approaches and corresponding WordPress plugins for fighting it:
- Append a “branded mark” message to be seen only in the feed to protect feed scrape.
- Make the RSS feed to delay a certain length of time after posting, thus leaving no ground for theft sites to be indexed first.
1. My signature at the thief’s site
How about inserting a signature in the RSS feed, so when it is scraped and reposted, the content keeps its “branded mark”? Easily done! Just use the Anti Feed-Scraper Message plugin. After installing and initializing it, on the WordPress Dashboard go to Settings -> Anti Feed-Scraper. Leave or edit the default message:
[postname] originally appeared on [sitename] on [postdate],
and now your signature gets appended as some bots catch and repost the feed. Smart. Unless they know how to cut it off …
2. The Feed delay plugin
The Google distributed indexing system initiates indexing the web pages quite fast. Therefore if you just delay RSS post for a while, the original content doesn’t get indexed later and thus the authorship rights get protected from duplicate content threat. The plugin prevents the feed from immediate publication. Just set up a delay.
The anti-scraping tools for protection against stealing content for farming are nowadays very necessary and handy for bloggers. In later posts we will develop the anti-scraping theme reviewing more tools and methods.
If you have some questions on anti-scraping tools or just want to know more about how to protect your web data, feel free to comment or leave your question through ‘Contact us’.