There has been a great deal of activity these past few weeks in the area of anti-scraping WordPress plugins including two new plugins that are designed to protect your feed from scraping by ensuring that attribution for the post follows the work wherever it is scraped.
The two plugins, RSS Footer and FeedEntryHeader, both work on the same principle. In both cases, they take your regular WordPress-created feed and insert links within it in hopes that the scraper, when he or she grabs the content, will also pick up the reciprocal link.
But what makes these plugins different is the details about how they operate. It is a set of minor tweaks that, in theory, could mean big changes in the way scraping is handled.
RSS Footer, written by Joost de Valk, takes the idea of the aforementioned plugins and takes it to another level. Unlike most such plugins, which simply reference back to the home page of the blog, RSS Footer references back to the article itself, using the original title.
According to Google’s Matt Cutts, this is the best way to ensure that content repurposed on other sites does not negatively impact you in the search results. Theoretically, this could further insure that spammers not only don’t gain from the content they have lifted, but that their victims don’t feel any ill effects.
This could go a long way to mitigate against many of the greatest concerns that come with being scraped and help a lot of bloggers sleep better at night.
FeedEntryHeader, however, takes things even farther. Its feed statement is completely customizable, including the option of using the post URL and post title, but it is displayed at the top of the feed entry, not at the footer. This helps to ensure that all scrapers, even those taking only the summary of the post, link back to the original site.
Another interesting twist that FeedEntryHeader provides is that the default message links uses the URL of the blog post as the anchor text, rather than the title of the post or the site. This, according to Stephen Cronin, the plugin’s author, is to mitigate in cases where the scraper strips the HTML out and leaves behind the text.
This is certainly an interesting way to handle a known weakness in such plugins. However, both plugins have issues that may limit their effectiveness when going into battle.
The biggest problem that both of these plugins have is that, as Cronin pointed out, that many spam bloggers have begun stripping out HTML tags to avoid any issues with formatting or diluting their impact on Pagerank. Though a naked URL, as with FeedEntryHeader, is better than nothing in that regard, it is unclear exactly how effective that is with the search engines.
Furthermore, other types of scraping, including keyword scraping and translated scraping are both on the rise and, in many of these cases, the entire entry may not be scraped. There is no guarantee that the links will be grabbed at all, even if they are placed at the top.
Of course, placing copyright information at the top of the feed, as FeedEntryHeader does, will have a higher “hit” rate than at the bottom, it could irritate many RSS readers. Where RSS subscribers seem to tolerate information at the footer of the entry, it is more question as to how they will feel with a protection mechanism in the first paragraph.
After all, if it is the first thing a scraper sees, it will be the first thing human readers see as well.
Still, it is easy to imagine many different situations where these two plugins will be very useful and there is little reason to doubt that they will both become standards among many WordPress installations.
Caveats aside, these two plugins are probably the most advanced ones out there at what they do. Though Copyfeed brings other features to the table, including digital fingerprinting and disabling feed access, for pure link insertion, these two plugins are currently tops, each in their own way.
As for which to use, that issue will come down to the Webmaster him or herself and, specifically, if they want the warning at the top or bottom. The top will ensnare more scrapers, though still not all, and the bottom will be easier for users to accept.
The bottom line though, is that if you are not running this or some other kind of feed extension plugin and you are a WordPress user with their own install, you need to seriously look at doing so. The plugins are widely available, simple to use and can provide a great deal of help in detecting, mitigating and even stopping RSS scraping.
There is little reason not to take action, but a lot to lose if you don’t.