Today, after some serious thought, I’ve decided to show exactly how it’s being done and highlight two services that, along with providing a service to end users who are tired of partial RSS feeds, are also helping to feed spammers as well.
These services are RSS expanders, meaning they convert short RSS feeds to full ones. Both of these services are public, free to use and have been in operation for some time. However, more recently, I’ve been seeing them used by spammers as clients come to me confused as to how a site is scraping their full content when they have a partial feed.
To understand how they work and what they mean, we have to take a look at two example services and what they can do.
What is a Feed Expander
The idea behind a feed expander is fairly simple, it takes a short RSS feed, one that either has truncated content only or is headline-only, and converts that into a full RSS feed. This is done by looking at the URLs in the feed, extracting the content from the Web page and then creating a new feed out of that.
Traditionally, spammers have performed feed expansion on their servers, reading the RSS feed and doing the scraping themselves. However, several new tools have been made available to the public, including FeedEx and FeedExpander that provide this service for free and to anyone willing to paste in a short RSS feed.
In short, there’s no need for a potential spammer to set up software of their own. They can simply feed your partial feed into the feed expander and scrape the full one it produces, no work or expense required.
THough the services don’t work on all sites, in particular those with unusual formats, they do work on most and, even though they aren’t perfect, they are already more than reliable enough for spammers, as evidenced by the ones I’ve seen using these and other services.
Is Feed Expansion Legal?
To help with this article, I reached out to both sites I mentioned above. I only heard back from FeedEx, where I got a response from Nikolay, who says his site respects robots.txt and doesn’t scrape content where robots are barred.
However, it’s unclear if such action is enough to make these services completely legal. The reason is that, while robots.txt work for search engines, search engines don’t redistribute the content and distribution is one of the rights that copyright protects. Furthermore, any implied license argument about this kind of use would be weak at best as the webmaster, by having a truncated feed, indicated pretty clearly that they don’t want their content distributed via that means.
While these services could mitigate this by truncating their outgoing feeds (as Google did with its service to produce feeds from any page), that would defeat the purpose of the service.
In short, these services take the content from other websites, copies it and posts it on another page (remember, an RSS feed is fundamentally a specially-formatted webpage). This is, quite frankly, the very definition of what copyright infringement online is, however, if these services are used by non-spammers, the rightsholder is unlikely to know or care.
However, this just deals with the copyright issues. As discussed previously, the scraping of a site brings about a variety of other issues including trespass to chattels, Computer Fraud and Abuse Act violations and more (Note: The original PDF linked to is offline, I’m working to find a replacement, in the meantime, see this article as well.).
All of this combines to paint a pretty bleak legal picture, but yet these services soldier on.
As mentioned above, I reached out to both sites before writing this article but only FeedEx responded (However, I will update this article should I hear back from FeedExpander).
When told that some webmasters are upset at his service, Nikolay responded that the complaints may not be as high as some would expect, saying that, “During all years of feedex.net presence, I have received just 2 complaints. And something around 100 of improvements requests.”
Nikolay also stated that, even though his site has a DMCA policy, he handled the requests with a simple email, blocking his bots from accessing those feeds.
When asked why he created the service, Nikolay, said that he did it first for himself as he wanted a more mobile way to view websites and was tired of sites with partial feeds and of alternatives such as Readability, especially on his tablet and phone.
That being said, Nikolay did acknowledge that spammers have used his service but that he has no means of stopping them, “I know that some spammers using my service bad way. At the moment I have no automated methods to ban them all and I cannot do that manually. So, I ban only those feeds, for which I have received complaint.”
FeedExpander, in its FAQ, says something similar, calling itself a “double edged sword” that is used both for legitimate purposes and for scraping.
Both sites claim that they designed their service for legitimate uses only and that the misuse of it is a side effect of the intended purpose.
Whether you feel that’s true or not, it’s clear that these services and ones like them are here to stay and the legal issues are, for the most part, purely hypothetical.
As I mentioned in my article earlier this month, there are ways you can fight back. This includes linking to yourself regularly, including footers in your posts and breaking apart content.
However, my goal with this post is not to cause these services to get flooded with removal requests. I really don’t think it would do much good. While I believe they will block your feeds (if they are even using them), they are only two services and most spammers that use this method and most spammers still prefer to use their own technology rather than rely on a third party.
What’s important to note is that the use of truncated RSS feeds is an almost complete waste. The only way it will help protect your content is by stopping those too lazy to use a feed expander. Given that there are so many full feed RSS sites out there, that protection might be worth something, but a dedicated scraper can easily get your content if motivated.
That being said, there is another benefit: Time. If you truncate your feed and someone passes it through one of these services, there’s going to be a delay in when it appears on their site. According to Nikolay, that time can be between 5-10 hours if your feed isn’t popular with the service. That, hopefully, will be plenty of time for Google to spot your site as the original and treat the scrapers as the spammers they are.
That being said, there are better way to track and prevent reuse of your content. All you have to do is plan in advance and not believe truncated feeds to be a silver bullet against the problem.
This is why my goal isn’t so much to take these services to task but to merely highlight that they exist. The spammers already know about them and you should too. Their existence doesn’t change much on the front of content theft and spamming, but hopefully they will raise awareness to what has been around for many, many years.