Digital Fingerprints to Detect RSS Scraping

By Jonathan Bailey • Oct 4th, 2006 • Category: Articles, News, Products

Though I’m a little late to this party, I want to talk about Kirk Montgomery’s recent Digital Fingerprint Wordpress Plugin.

It’s a new an interesting way to detect potential scraping of your Wordpress feed and to discourage content theft. By turning the splogger’s favorite weapon, the search engines, against them, it aims to make detecting of scraping as easy as possible.

It is a plugin with many benefits and uses, but also a few drawbacks. However, as a beta program, it still has a lot of room to grow and develop.

The Big Idea

The idea behind the plugin is fairly simple. You take a unique word or phrase, for my test I used a semi-random collection of letters and numbers, and embed that code into every entry of your RSS feed.

Then, whenever that unique phrase appears in a search engine, you have a potential scraper and can take actions to stop him or her.

The MaxPower plugin makes it easy to do all of these things. By adding a button to your Wordpress editor to embed the fingerprint into the entry and a special page to the admin area to display the search results, it enables you to both protect and monitor your works without leaving the administration area.

However, the plugin isn’t without limitations and drawbacks. Fortunately, most of these are issues that may be able to be resolved in later updates to the plugin.

Problems

The main weakness in the plugin is likely its reliance on the search engines. Since search engines are getting better at filtering out spam and duplicate content, it’s very likely that a scraper might not appear in them. Though the plugin mitigates that some by polling as many search engines as possible (five by my count), many known sploggers will not appear.

This may be improved later by offering an invisible image which can be tracked by an internal hit counter.

Second, though the administration interface is a convenient place for many Wordpress users, others, including myself, use applications like BlogDesk to write their blog entries. Using the plugin requires editing the entry by hand and then visiting the admin area later to check on the results. Many only visit their administration area to check on comments, install plugins or make theme changes. Automatic insertion of the fingerprint could ease this some.

Also a concern is that any legitimate reader viewing your RSS feed will also see the fingerprint. This could cause some confusion, especially if it is placed in the middle of the entry, and an explanation as to what it is might reduce its effectiveness.

Finally, the biggest overall obstacle the plugin has to achieve is providing protection that Feedburner can not. Feedburner, by detecting where a feed is actually used, does not rely on the search engines and can provide very early warning to possible infringement. Also, its protection is invisible to the end user.

Though the MaxPower plugin is more convenient to most users, it is not necessarily better protection at this time. That may change though in future versions.

Conclusions

For a beta, the plugin is already very useful and is a must-have for any Wordpress site that can not or does not use Feedburner (Sadly, plugins are not allowed on free Wordpress.org accounts). Even sites that use Feedburner will likely find it a valuable second layer of protection.

Most of the kinks and problems with the plugin can easily be ironed out in future versions (it is still in beta) and it easily grow in a must-have plugin for all Wordpress users worried about content theft.

Though it certainly shouldn’t be the only layer of defense, if at all possible, it can definitely be a valuable one.

Tags: , , , , , , , , ,

Short URL to this Post: http://copybyte.com/z/c8

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • JB
    MaxPower,

    I'm glad that you liked the article, after re-reading it I realized that my tone was more harsh than intended. I really do love the plugin and, though it has some weaknesses, I think it's VERY valuable, especially considering it was just at Beta 1 when I wrote about it.

    While this plugin is only as strong as search engines’ ability to index splogs, sites that don’t get indexed probably aren’t that important in the big picture. Can it be easy to make money splogging if you aren’t in the major indexes? I think not, but I could be wrong.

    Sadly, yes. Other alternatives do exist including comment/trackback spam and spam pinging. Granted, you are correct that a splog would be MUCH less effective without a good search engine presence, I still don't want my content appearing on links pointed to by comment spammers.

    You are right that those without a search engine presence are much less important, I'm just not quite ready to completely ignore the ones without. Besides, the best solution is to detect theft before it hits the search engines, especially Google, to avoid potential penalties.

    As to adding a tracking image, the main splogs I am familiar with remove links and images (A HREF and IMG) leaving only the text. Wouldn’t this render tracking images useless?

    Very true. I guess I should have thought that one through better. The ideal solution would probably be tracking the feed page itself. Since most sploggers host their apps on their server, it could easily be distinguished from normal uses.

    Its not clear to me if FeedBurner does anything at all other than report ‘uncommon usage’. I can’t block IPs using FeedBurner, and I lose a substantial amount of control over the feed

    I talked with FB about this shortly after my article on cloaking to stop scraping. I was told that they were going to look into something similar for them. No word yet. It takes a while to develop these kinds of features.

    As far as the splog goes, email FeedBurner tech support about it. Let them know and they'll track it down. If you can do that, you'll improve the metrics for everyone. I, personally, have not encountered this problem (I've had a few scrape my summaries via Technorati feeds) so I can't comment.

    I don't think the solution is to quit FeedBurner, but rather, to use it in tandem with your plugin. The more layers, the better.

    Awareness is always the first step.

    Well put! I agree there one hundred percent.

    Thank you for the wonderful plugin and the update!
  • Thanks for your feedback regarding this plugin -- your thoughts are well received and written fairly. The issues you present can and will be addressed over time. Many people have already requested that their fingerprint be automatically inserted in posts when published, hopefully the next version will include this feature.

    While this plugin is only as strong as search engines' ability to index splogs, sites that don't get indexed probably aren't that important in the big picture. Can it be easy to make money splogging if you aren't in the major indexes? I think not, but I could be wrong.

    Then again, stealing is stealing although with copyright issues, I understand that the magnitude of the offence is partly determined by 'actual damage'. So basically I'm all over the map with that one.

    As to adding a tracking image, the main splogs I am familiar with remove links and images (A HREF and IMG) leaving only the text. Wouldn't this render tracking images useless?

    I have been doing some FeedBurner research as well. Its not clear to me if FeedBurner does anything at all other than report 'uncommon usage'. I can't block IPs using FeedBurner, and I lose a substantial amount of control over the feed. One big splog that constantly re-uses my content gets it via FeedBurner, yet it is never indicated as an uncommon usage. I'm trying to figure out a way to deal with this issue other than by quiting FeedBurner (which provides a number of good free services).

    Anyway, if nothing else I hope this plugin opens up peoples eyes to the massive connected problems of splogging, RSS, and advertising. Awareness is always the first step. Thanks for the review!
blog comments powered by Disqus