Protecting the Comment Feed

By Jonathan Bailey • Aug 1st, 2007 • Category: Articles, Prevention

When bloggers and Webmasters look at the issue of RSS scraping, they typically think solely about their main RSS feed. It is the feed that contains their posts, the one that is most subscribed to and the one most prominently displayed.

When one thinks of “subscribing” to a site, most think exclusively about the main feed.

However, on many sites, the main RSS feed only makes up a small fraction of the total content on the site itself. On many active sites, the bulk of the material is posted by visitors via the comments feature.

Though this reliance on user-generated content (UGC) brings with it a slew of new risks, one of the more overlooked dangers is the ability for such content to be scraped and republished the same as the original posts.

The problem is that many blogging platforms, including WordPress, offer a special feed for the comments, which is overlooked by most bloggers and nearly all visitors.

But as bloggers take steps to protect their feed, the comments feeds might become a more interesting and valuable target.

Though the problem is not widespread yet, it is an issue worth pondering and planning for, especially with large sites that receive a large volume of comments.

Complications with Comments

When compared to traditional RSS scraping, comment scraping provides a different set of problems.

First, since the Webmaster does not hold the copyright the majority of the comments posted on a site, just the ones they posted. Thus, they can not always file a DMCA notice or a cease and desist letter for the comments that are taken. This makes stopping such a scraping very difficult once it has taken place.

Second, most of the tools and plugins that are designed to protect RSS feeds do not work with the comments feed. Though one can certainly register their comments feed with FeedBurner and take advantage of those tools, there is little else that can be done, at this time, to protect the comments feed from scraping.

Finally, since almost no one remembers to check the comments on their site or look for comments that they left elsewhere, detection of such scraping will almost never take place. To make matters worse, since comments are not often picked up by blog search engines, they are less likely to note that the spam site is actually a copy.

However, many of the dangers being scraped remain with this kind of scraping. Search engine rankings can be affected through either direct penalties or increased competition, confusion about the origin of the work can be created, especially since the original work, a comment, was turned into a blog post, and traffic can be siphoned off.

Worse still, if readers discover that they are being scraped so blatantly, they are less likely to participate in the discussion and may stop commenting all together, resulting in a decrease of material on the site and a loss of community members.

All in all, if your site receives a large volume of comments or relies heavily upon them, it makes sense to take a few moments and look at protecting your commenters from being scraped, both for the benefit of your visitors and yourself.

Taking Steps

Since cessation of this kind of scraping can be very difficult, it is important to focus more on prevention.

The easiest way to prevent such scraping is, like this site, simply not offer a comments RSS feed. Such feeds are rarely subscribed to, offer little value to users and, on busy sites, are too noisy to be practical. There are much better ways to allow your users to subscribe to discussions they participate in, such as via email or a system such as CoComment.

Eliminating the comment feed can be achieved by either deleting or renaming the file associated with it. On Wordpress, the file is in the root directory and is entitled wp-commentsrss2.php. Removing that file should not affect any other functions but would eliminate the comment RSS feed.

If you wish to keep your comments RSS feed, you can rename the file above to a random name and then create a new FeedBurner feed using that address. With FeedBurner Feedflare feature, you can then add footers to each post easily that can both attribute the source of the post, link back to your site, and provide detection, similar to a digital fingerprint.

You can also edit the feed by hand by opening up the includes folder in you WP install and editing the feed-rss2-comments.php located in there. Simply add the content you want displayed after the “?php comment_text_rss() ?” tag and it should display after the comment body in the feed.

If you use another blogging platform, you will have to look up directions specific to it.

No matter what application you use, this kind of editing requires a fairly high comfort level with RSS and PHP, making this a less appealing solution. Given that FeedBurner is now free, it would likely make more sense to go ahead and use their tools if at all possible, at least until easy plugins are offered to fix this issue.

Good News

The good news in all of this is that, right now, this kind of content theft is very rare. Since comment feeds are not sent out over pinging services and aren’t, usually, promoted directly in the site itself, most scrapers don’t pick them up. Besides, there is more than enough original blog material to keep the spam machine rolling for now.

Currently, the only sites with any cause for real concern are larger ones that have very robust communities around them. Sites with only a few short comments are of little use to scrapers. Spammers need dependable, lengthy, keyword-rich feeds to scrape and most comment feeds simply do not meet that bill.

However, scraping and spam blogging is a constantly evolving art. As Webmasters grow more and more wise to the problem of RSS scraping and scrapers’ thirst for new material continues to grow, the spam bloggers may be pushed to alternative sources of content. The comments feed we all forget about could be a target.

It is best to think about that possibility now and take steps today, just in case things change tomorrow.

Conclusions

Unlike the main feed, on most sites, the comment feed is a largely useless tool. It presents more dangers than benefits and, for the most part, can be removed or locked down with little impact to the end users.

With other tools better able to stimulate the conversation, there is not much of a case for keeping it around at all, especially since it is almost useless without the main feed, but it can be easily protected if desired.

All in all, how one uses their comment feed is up to them. But one thing is almost for certain, we can not afford to ignore it for much longer.



Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • Thats a really good point and it really is overlooked. People can definitely steal comments and use them on their sites, especially if its a forum type site.
  • This is even more of a reason to a Numly Number as a copyright license in your RSS feed. Many of the splogging tools on the market today do not strip out these license identifiers but rather treat them as content.

    Having the Numly Number embedded in the feed and ultimately appearing on someone else's web site is like advertising that your content has been lifted. You can even use the Numly Number to track back to the original copyright submission and find or contact the author.
  • JB
    Webd360,

    There are a lot of ways a scraper could use comments on their site, none of them good. I am yet to see an obvious example of this but, with the way scraping is growing, I am worried that it might happen soon enough.

    Chris,

    Chris, would you recommend having an ESN for the entire feed or one per comment? How would this work in this case?

    Please clarify as I am very interested in this!
  • I would recommend having an ESN embedded in each post so that the feed contains multiple ESNs. If your feed only shows the first 250 characters or so of your post, put it in the title or at the top of your post to ensure that it is picked up in the feed.

    If you are running WordPress, you may be able to embed the Numly plugin into your comments with a minor tweak. If so, this would be the best of both worlds and allow you to protect your contributors' works with copyrights on their behalf.

    What do you think?
  • JB
    Chris,

    It sounds like an interesting idea, but my fear is that it would be cost prohibitive. I'm already pushing my ESN count at the end of every month and if I had new ESNs assigned for every comment, I would be well over the limit.

    Is there, perhaps, a way to reduce the number of ESNs needed?
  • This thread itself is an example of the value of the comments, I would have never thought for a moment anyone would scrap comments from a page. I've just used stumble to send this post to my partner, who will hopefully come along behind me, and begin installing the nimbly .. numbly.. yeah that to my blogs soon :)

    Thanx, yet again!
  • JB
    Cathy: Are you talking about Numly? If so, I support the move.

    It is a strange thought, someone scraping the comment feed, but I can see reason for it. I don't think too many have tried it yet though. Sadly, it's only a matter of time.
blog comments powered by Disqus