Google/FeedBurner To Tackle Scraping?
When FeedBurner announced that it was being bought out by Google last month, several things were immediately clear.
First, it became clear that the owners of FeedBurner were going to receive a $100 million payday. Second, it was clear that Google would receive access to a berth of statistics about RSS feeds not previously available to them. Finally, it was clear that Google was making yet another foray into the advertising market, buying up the the budding FeedBurner Advertising Network.
However, many questions have gone unanswered. Though FeedBurner attempted to head off some of the questions with their FAQ on the subject, many of the more difficult questions remain. Will Google integrate FeedBurner into its analytics program? Will the FeedBurner ad network be integrated into Adsense? What new features will Google bring to FeedBurner?
However, possibly the toughest question is what impact, if any, will this merger have on feed scraping and spam blogging? With the largest search engine, largest ad network and most popular spam blog host merging with a company managing nearly 750,000 feeds, it seems only logical to expect at least some changes in how Google fights scraping and spam.
A Wealth of Knowledge
With millions of subscribers, hundreds of thousands of feeds and nearly half a million publishers, FeedBurner knows more about RSS feeds, who is subscribed to them and how they are being used than just about anyone else.
They already offer an uncommon uses feature that tracks non-conventional feed subscribers and offers a feed flare service that can help track feed use.
Google, by purchasing FeedBurner, will be gaining access to all of this knowledge. More than just an RSS platform, FeedBurner provides an bird’s eye view of RSS and, though Google is no stranger to hosting RSS feeds through its Blogspot service, FeedBurner analytics and broad range of users, almost all of which are non-spammers, brings a depth and breadth that Google has not had in this area.
If done correctly, this merger could be more than a strategic buyout, but also a marriage of the minds, combining Google’s dominance in search and advertising with FeedBurner’s knowledge of RSS. That marriage could, at least theoretically, prove very formidable when dealing with scraping and spamming.
A Tag Team
With the purchase of FeedBurner, Google will not only be indexing but hosting the feeds for hundreds of thousands of blogs, including many of the most popular. It will have the ability to know what sites and servers are accessing the feed, which of those are suspicious and where to find sites using feeds questionably.
At the simplest end, this can be used to bolster duplicate content detection. If a site scrapes a feed and republishes it, even if the use is permitted and the scraper is not a spam site, it is still duplicate content and needs to be indexed below the original work.
More advanced than that would be checking to see which sites and networks scrape a large number of feeds. By detecting hidden elements in the feed itself, something FeedBurner already does, Google can see which sites are scraping large amounts of content. From there, they can deindex the worst infringers or even cut the applicable Adsense account if appropriate.
Finally, if feed licensing terms can be better expressed, it might be possible for Google to use this information to automatically mark scrapers as spammers. Not only could they deny access to the feed, but also take action on eliminating the spam from their search engine as well as cut the hosting or advertising for the site if applicable.
The end result of all of this would be that FeedBurner would become more than just a way to track and monitor your feed, but also a means to claim your content and ensure that Google knows you are the original creator of the work, not the scraper that came along immediately after.
It’s a nice dream, but unfortunately it is most likely just a fantasy.
Puncturing the Dreams
The problem with this vision is that Google, already, has proved incapable of coordinating its spam fighting efforts across its different arms. Filing a DMCA notice to Blogspot will not result in any action being taken against the owner’s Adsense account, even if the site was clearly spam. Likewise, a notice to the search engine will have no effect on a Blogspot account and a notice against Adsense will have no effect on either the site or how it is indexed.
Google, it seems, is a true case of the right hand not knowing what the left one is doing.
This is frustrating as it requires legitimate Webmasters to double or triple efforts to ensure that a spam blog and the gains from it are entirely wiped out. It slows down the pace of spam fighting and makes stopping the practice of spam blogging, not just the splogs themselves, a more daunting task.
Since Google has thus far been unable to coordinate its efforts across three areas and there has been no clear push to remedy that, it seems unlikely that the addition of a fourth front will help this matter. Though I am hopeful that will be the case and can certainly see the potential, it doesn’t seem likely.
Still, this could be the opportunity that Google has been looking for to not just battle spam blogs, but to potentially get ahead of them.
Conclusions
Google has an incredible opportunity with the purchase of FeedBurner. More than just a potentially savvy business move, it’s the chance to do some real good and make some tremendous progress in the war on spam.
To do this would take a great deal of effort, likely cost a good sum of money and probably cost Google a significant portion of their revenue stream, however, it would make the Web much better place for all of us.
There is little doubt that Google’s purchase of FeedBurner will imbue them with a lot of new knowledge and lead to some interesting possibilities. What remains to be seen is how they use that knowledge.
Though all of this article is just conjecture and guesswork, it shows that the possibilities for this marriage are great. However, it will be up to Google to determine how this alignment is used and if the potential is squandered or not.
Let us hope that this will be a fruitful marriage, producing more than just extra lining for Google’s pockets…
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.