RSS scraping is some of the most frustrating forms of plagiarism to deal with. In addition to taking the entirety of one’s site, it is done by the most callous of plagiarists, spammers, for the worst of all reasons, to sell products or gain search engine prominence.
Worse still, protecting RSS feeds is a daunting challenge. Open by nature, RSS feeds are ripe for scraping and preventing such theft without impacting legitimate readership is almost impossible.
However, as blogging as advance, new tools and services have emerged to help deal with feed scraping. One of the more popular services, FeedBurner, has been one of the best tools for both generating feed statistics, but also detecting scraping .
In the meantime though, new WordPress plugins have come along to offer different forms of protection against scraping.
The question becomes simple, are we safer using FeedBurner or are we better off trying to go it alone, using the tools that are now available to us?
The answer is not entirely clear.
The Power of FeedBurner
There is little doubt that FeedBurner is a powerful and well-rounded statistical analysis service for RSS feeds. It provides detailed readership analysis including subscribers, views and the RSS readers used.
Recently, FeedBurner also launched an “uncommon uses” feature that points out suspicious uses of a feed and gives the owner the chance to follow up. It has been used to spot many spam bloggers and feed scrapers that unwittingly fell into FeedBurner’s trap.
Additionally, FeedBurner offers several “Feed Flares” that can add copyright or other information to feed articles. Additionally, FeedBurner makes it possible to move or hide the original feed (if the user has such control over their server), making it virtually impossible for a scraper to use anything but the FeedBurner feed.
This all combines to make FeedBurner one powerful and easy to use feed protection service. However, new WordPress plugins, at least one of which is not compatible with FeedBurner, that may provide at least an equal level of protection to FeedBurner.
Plug Me In
Antileech, by Owen Winkler, is a useful and popular WordPress plugin that stops many scrapers cold, replacing the feed’s content with links to the originating site and other content that is useless to the scraper. It detects scrapers by looking at both the user agent string and the IP address of those accessing the feed.
However, Antileech only works on the original WordPress feed. It can not protect a feed that is run by FeedBurner. Though it can be used to protect the original feed (the one generated by WordPress and used by FeedBurner) the main public is still unprotected.
Also useful is the Digital Fingerprint Plugin by MaxPower. This plugin adds a unique phrase or word to the body of each post in the RSS feed and searches the Internet for all instances of that “fingerprint”. Pages containing that unique identifier are almost certainly scrapers as the fingerprint does not appear on the actual site.
Unlike Antileech, the Digital Fingerprint Plugin does work well with FeedBurner. However, since FeedBurner already provides decent detection of scraped content, it can only add an extra layer of vigilance.
However, in combination with Antileech, it can be a very powerful tool. Since Antileech can not automatically detect every bot that attempts to access your site, searching for scrapers becomes more crucial than ever. Fortunately, that is a task that the Digital Fingerprint Plugin automates nicely, making the two a natural pair.
When put in that light, the major limitation of FeedBurner becomes clear. While it does a decent job detecting scraping, it doesn’t offer a way to stop it. Though it can easily be done if one controls the server, there is no way to cut off or redirect feed scrapers if they are pulling their content from a FeedBurner feed.
To Burn or Not to Burn
The question becomes this: Looking at it solely from a content theft perspective, is it better to use FeedBurner or rely instead on readily available plugins and control over the server.
Though the question is clear, the answer is not. Since every situation is different, there is no one universal answer. Consider the following scenarios:
Users of Blogspot, WordPress.com or other Hosted Service: Individuals that do not run their own server can not install plugins or take advantage of access rules that can block scraping. They are, generally, much better off using FeedBurner if possible. Though most services will not allow them to hide their non-FeedBurner feed, it still provides a strong layer of protection from most scrapers.
Users of Another Blogging Platform: Though WordPress is the most common blogging platform, it is not the only one. The best solution will depend on the platform and the tools available to it. FeedBurner, in most cases, will still likely wind up being the best solution as the plugins that would replace it are not available.
WordPress Users with Own Installs: If you are a WordPress user with your own install, the situation has changed. Your feed may be extremely safe without FeedBurner. The plugins available to WordPress users can easily replace most FeedBurner functions with a little work. This includes other FeedBurner specialties such as statistical analysis and monetization. Though FeedBurner may offer additional benefits in the way of feed reliability and ease of use, it’s no longer the clear winner it once was.
These are, of course, just some very broad scenarios. The decision is going to come down to individual Webmasters deciding what is right for them and their site.
I’ve been a big fan of FeedBurner since day one and I’ve made no secret of that. Though their services are still valuable, many of their features, especially in regards to content theft, can now be matched or exceeded by free plugins that don’t require a feed be reburned.
That’s going to be appealing to many people, especially those that would rather just cut off scrapers rather than deal with their hosts, and will work out better in many environments.
My hope is that, someday, FeedBurner will bring this feature into their own service and allow their users control their feeds on par with what they have on their own servers. Though it might increase complexity and raise new support issues, it would be a tremendous leap forward for sites that are repeatedly scraped.
For me, personally, I doubt that I will abandon FeedBurner. I simply have too much invested in FeedBurner’s stats and analysis to walk away. Furthermore, I use the Digital Fingerprint Plugin with great success and have no trouble dealing with hosts as needed. Besides, Plagiarism Today doesn’t seem to be a repeated target of scraping.
The reasons for that are pretty clear.
However, others, especially new bloggers, should consider their options carefully. If they are entering a field where scraping might become a major problem, it’s important to carve out a strategy that deals with the issue in the most effective way possible.
Whether that strategy involves FeedBurner will be up to the individual to decide.
Tags: Content TheftCopyright, Copyright Law, DMCA, Plagiarism, Scraping, Spam, Splogging, Splogs, FeedBurner, WordPress, Plugins