When bloggers and Webmasters look at the issue of RSS scraping, they typically think solely about their main RSS feed. It is the feed that contains their posts, the one that is most subscribed to and the one most prominently displayed.
When one thinks of “subscribing” to a site, most think exclusively about the main feed.
However, on many sites, the main RSS feed only makes up a small fraction of the total content on the site itself. On many active sites, the bulk of the material is posted by visitors via the comments feature.
Though this reliance on user-generated content (UGC) brings with it a slew of new risks, one of the more overlooked dangers is the ability for such content to be scraped and republished the same as the original posts.
The problem is that many blogging platforms, including WordPress, offer a special feed for the comments, which is overlooked by most bloggers and nearly all visitors.
Though the problem is not widespread yet, it is an issue worth pondering and planning for, especially with large sites that receive a large volume of comments.
Complications with Comments
When compared to traditional RSS scraping, comment scraping provides a different set of problems.
First, since the Webmaster does not hold the copyright the majority of the comments posted on a site, just the ones they posted. Thus, they can not always file a DMCA notice or a cease and desist letter for the comments that are taken. This makes stopping such a scraping very difficult once it has taken place.
Second, most of the tools and plugins that are designed to protect RSS feeds do not work with the comments feed. Though one can certainly register their comments feed with FeedBurner and take advantage of those tools, there is little else that can be done, at this time, to protect the comments feed from scraping.
Finally, since almost no one remembers to check the comments on their site or look for comments that they left elsewhere, detection of such scraping will almost never take place. To make matters worse, since comments are not often picked up by blog search engines, they are less likely to note that the spam site is actually a copy.
However, many of the dangers being scraped remain with this kind of scraping. Search engine rankings can be affected through either direct penalties or increased competition, confusion about the origin of the work can be created, especially since the original work, a comment, was turned into a blog post, and traffic can be siphoned off.
Worse still, if readers discover that they are being scraped so blatantly, they are less likely to participate in the discussion and may stop commenting all together, resulting in a decrease of material on the site and a loss of community members.
All in all, if your site receives a large volume of comments or relies heavily upon them, it makes sense to take a few moments and look at protecting your commenters from being scraped, both for the benefit of your visitors and yourself.
Since cessation of this kind of scraping can be very difficult, it is important to focus more on prevention.
The easiest way to prevent such scraping is, like this site, simply not offer a comments RSS feed. Such feeds are rarely subscribed to, offer little value to users and, on busy sites, are too noisy to be practical. There are much better ways to allow your users to subscribe to discussions they participate in, such as via email or a system such as CoComment.
Eliminating the comment feed can be achieved by either deleting or renaming the file associated with it. On Wordpress, the file is in the root directory and is entitled wp-commentsrss2.php. Removing that file should not affect any other functions but would eliminate the comment RSS feed.
If you wish to keep your comments RSS feed, you can rename the file above to a random name and then create a new FeedBurner feed using that address. With FeedBurner Feedflare feature, you can then add footers to each post easily that can both attribute the source of the post, link back to your site, and provide detection, similar to a digital fingerprint.
You can also edit the feed by hand by opening up the includes folder in you WP install and editing the feed-rss2-comments.php located in there. Simply add the content you want displayed after the “?php comment_text_rss() ?” tag and it should display after the comment body in the feed.
If you use another blogging platform, you will have to look up directions specific to it.
No matter what application you use, this kind of editing requires a fairly high comfort level with RSS and PHP, making this a less appealing solution. Given that FeedBurner is now free, it would likely make more sense to go ahead and use their tools if at all possible, at least until easy plugins are offered to fix this issue.
The good news in all of this is that, right now, this kind of content theft is very rare. Since comment feeds are not sent out over pinging services and aren’t, usually, promoted directly in the site itself, most scrapers don’t pick them up. Besides, there is more than enough original blog material to keep the spam machine rolling for now.
Currently, the only sites with any cause for real concern are larger ones that have very robust communities around them. Sites with only a few short comments are of little use to scrapers. Spammers need dependable, lengthy, keyword-rich feeds to scrape and most comment feeds simply do not meet that bill.
However, scraping and spam blogging is a constantly evolving art. As Webmasters grow more and more wise to the problem of RSS scraping and scrapers’ thirst for new material continues to grow, the spam bloggers may be pushed to alternative sources of content. The comments feed we all forget about could be a target.
It is best to think about that possibility now and take steps today, just in case things change tomorrow.
Unlike the main feed, on most sites, the comment feed is a largely useless tool. It presents more dangers than benefits and, for the most part, can be removed or locked down with little impact to the end users.
With other tools better able to stimulate the conversation, there is not much of a case for keeping it around at all, especially since it is almost useless without the main feed, but it can be easily protected if desired.
All in all, how one uses their comment feed is up to them. But one thing is almost for certain, we can not afford to ignore it for much longer.