According to Attributor, the system “works by allowing publishers to register and digitally fingerprint each piece of content” and then checking for uses of that content that violate previously determined rules for use. The system can then take action against any misuse of the content, including requesting a link back or filing a takedown notice.
The enterprise version, which is what Attributor is taking customers for at this time, costs tens of thousands of dollars per year to operate according to TechCrunch. A different version of the service aimed at individuals and bloggers should be available some time in 2008.
Also, according to the TechCrunch article, Attributor is currently indexing over 100 million pages per day and has already indexed over 15 billion. The current system only works with text but an image detection system is in early beta and audio/video detection is also planned.
Clearly, Attributor could be a game-changer in this industry and this will be a product to keep track of and follow as it grows.
In a separate announcement, Attributor unveiled the results of an analysis the company did on the copying and pasting of song lyrics.
The results of the analysis found that song lyrics are copied widely on lyric sites, fan sites and blogs and that very few are officially licensed. What was surprising was that the only official source studied, Yahoo! Music, was ranked behind all of the unofficial copies in Google and almost all in their own search engine.
Other tests done by Attributor, including one revolving around the recipe site Epicurious, have found similar results. In that study, over 10,000 copies were found and over half had a higher search engine rank than the originals. Also, over half of the sites had ads on the pages and 60% failed to link back to the source.
Though these results are not likely to be a surprise to anyone who reads this site regularly, it is an embarrassment for Google, which has repeatedly said that it can distinguish between copies and originals. In a blog post in December 2006, they said, “Though annoying, it’s highly unlikely that such sites (scrapers) can negatively impact your site’s presence in Google.”
Apparently Google’s accuracy in this area leaves a great deal to be desired. This may explain some of Attributor’s harsh words for Google, calling it “unacceptable to publishers”, and why some feel that an Adsense lawsuit is on the horizon.
Attributor has the potential to be a major game-changer in this field. It is already making its impressions on the enterprise level and has signed on many prominent customers, including both the AP and Reuters, but it remains to be seen what impact it will have to individuals and bloggers.
Since the single user version of Attributor is not slated until some time in 2008, we will have to simply wait and see.
In the meantime, we are forced into a choice between paying high dollar for an enterprise or academic solution and using a hodgepodge of limited but free services. Though it is possible to track and monitor one’s content very well using the currently available means, it is growing more and more time restrictive as the Internet, along with the misuse of content, grows by the minute.
The value of a system like Attributor is not just its ability to detect more reuse of content, to organize it in a way that is maintainable and offer solutions that are practical. If Attributor can do that, then real progress will be made in protecting content on the Web.
Regardless, this is a company to follow and to watch.
Disclaimer: I am a consultant for Attributor.