Hachette Book Group, a New York-based book publisher, announced a new partnership (doc file) with content tracking service Attributor that will allow the company to track uses of its content on the Web.
Though this is yet another major client for Attributor, which already added Time Warner to its video monitoring service in April, it also marks the addition of a new feature for Attributor, PDF detection.
Attributor’s service can now detect duplication in PDF files using a hybrid of its existing text matching service, which is used to detect likely matches, and a file matching system to detect copies. The system works even if the content was not originally uploaded in PDF format, meaning the person offering the unlicensed copies converted the file to PDF, and works with many different file sharing, sometimes referred to as “cyberlocker” sites.
This is a big announcement because it makes the Attributor system useful for a whole new range of new audiences. Where once it was targeted at publishers of text articles, such as blogs, news organizations, etc. it is now available to also help book publishers, such as Hachette, research firms and others that routinely publish to PDF format or have their content repurposed into PDF format.
Though Google and other search engines have been able to find matching text within PDFs, Attributor looks at the PDF file itself, similar to what it does with its image search, enabling it to go beyond simple text matching and look for other similarities. How effective this matching is remains to be seen, may be the subject of a future test for me, but it is an interesting feature and one that is likely to be compelling for for many publishers.
In the end, if you publish a lot of files to PDF format or work in a type of content that is often reformatted into PDFs, you may want to give Attributor a second look.
Disclosure: I have consulted with Attributor.