Blog scraping and spamming has been an issue for Webmasters for years. RSS scrapers, errant feed readers and “aggregators” that simply republish content blindly have been issues for writers for almost as long as blogging has been popular. However, typically, commenters have been spared from this kind of republishing.
That is, until now.
A new service entitled Backtype is aiming to create a search engine for comments but, along the way, is scraping and republishing full comments from people who post to various sites, including this one.
This has already created some heartburn for Webmasters and has already forced BackType to post to their blog to defend their actions. However, to many, their explanations have not been enough and it seems that, though some are enamored with the service, others are upset about about what the service is doing.
It is very hard to quickly explain what BackType is. It is first and foremost a comment search engine that spiders the Web looking for comments and then organizes them by person. The organization is done by a combination of the name of the commenter, Jonathan Bailey in my case, and the URL they use as their home page.
This enables you to search for comments based upon either the name of the person leaving it or the keywords within the comment itself. This makes BackType useful for tracking comments left by people or comments on a particular topic.
But while the service itself has a great deal of potential for usefulness, it also republishes the full comments on their results pages. Furthermore, those results pages are indexable by other search engines, including Google, and that opens the door for duplicate content issues.
Worse still, there is currently no way for a commenter to opt out of the system. At this time, the only way you could get your comments removed from the system is to file a DMCA notice with their host. At this time, however, I do not recommend such a solution.
Webmasters, if they want, can use robots.txt to ban the crawler from their site. However, this solution does not help the vast majority of blogs as they are on freely-hosted services that do not allow editing of the robots.txt file. The end result is that, for the majority of commenters who have had their work appear on this site, there is no way to get your work removed if you want it out.
Comparisons To Google
In interviews, including an email response to me, Chris Golda, one of the co-founders of BackType has repeatedly compared his service to Google saying in one interview that:
“Itâ€™s very similar to what Google does, or as you suggest, Techmeme. I can find all of my comments on Google â€” itâ€™s just that theyâ€™re on page 423843 and seemingly irrelevant. Thatâ€™s because Google is organizing the worldâ€™s information. BackType organizes the information in comments.”
There are indeed many similarities between BackType and Google including the fact that both use spiders to crawl and index content, not RSS feeds.
However, there are several critical differences between Google and BackType, especially as it pertains to how BackType deals with the content it indexes.
- BackType Displays the Full Comment: Google is very careful to truncate all of the pages it indexes when displaying its results. BackType, on the other hand, displays the full comment on its own pages, which it in turn opens up to other search engines. Google does offer a cache of the pages but that is a backup system, not the primary results page.
- No Effective Opt-Out: Google offers multiple ways to opt out of it including meta tags, robots.txt and even a removal request form. BackType, on the other hand, offers only one that is not even accessible to most bloggers.
- Nofollowing Outbound Links: Though BackType is happy to allow search engines to index its results pages, when it links back to the original site, the link is almost always nofollowed meaning that no search engine recognition is passed on to the original site. Google does not do this.
- No DMCA Support: As a host and as a search engine, BackType has obligations under the DMCA. As of this writing, they have not designated an agent on their site nor have they designated one with the United States Copyright Office. Not only does this put them at risk of losing their safe harbor, but makes it difficult for commenters who want information removed from BackType.
- Indexable Search Pages: Google and most other search engines ban the indexing of their results pages from other search engines. This is done using their robots.txt file in this case. However, BackType intentionally leaves their results pages open to indexing, ensuring that they are listed, ranked and competing with the original sites.
In short, though there are natural comparisons between Google and BackType, the latter has not shown the care and concern for respecting the work of commenters that Google has.
Sadly, this puts an anchor around a service that I want to love very badly.
Unlike most of the sites that I see scraping and republishing content, BackType actually has a very significant legitimate use apart from its duplicated content.
The organizing of comments is a noble goal and there is a lot of reason people such as myself, who track and respond to comments regularly, should be excited about the potential.
It is my belief after talking with Golda via email that this slight was not intentional. Though they are resolute that they are doing nothing wrong, the simple fact is that their decision to wholesale republish content is a controversial one and it is a distraction from the real objectives of the service.
There is no need for BackType to display the full comment, they can provide their service just as well, if not better, with truncated posts and respect for proper nofollowing guidelines. I seriously fail to see how thousands of pages of duplicate content is worthwhile to their business model, especially considering the backlash that comes with it.
Instead of talking about how useful of a service BackType is or can be, many are debating the spam/scraping issue that comes with posting full comments.
It seems, to me at least, to be a waste of energy for the company.
With the Web shifting and evolving to new models, the issue of who has the right to publish the content where is going to be a difficult one. However, it is going to be important for writers and artists to speak up as companies, often times innocently, may cross lines in the pursuit of creating the next cool product.
I don’t believe at this time that BackType is one of the bad guys, they seem determined to create a great tool and may have overlooked a few issues. That is understandable and forgivable.
However, without changes to the service, a lot of commenters and bloggers will be very upset, especially as BackType rises in the search results.
Without those changes, it seems likely to me that the controversy has only just begun.
- Cruel to be Kind: A great overview of the issue.
- StartupNorth: A positive review of the service.
- Library Clips: An overview of how BackType Works
- What are your thoughts on BackType?
- Should I block BackType from indexing this site?
- What changes, if any, would you recommend the service make?