Attributor Launches PDF Support

By Jonathan Bailey • Jun 17th, 2009 • Category: Articles, News, Products

attributor-logo

Hachette Book Group, a New York-based book publisher, announced a new partnership (doc file) with content tracking service Attributor that will allow the company to track uses of its content on the Web.

Though this is yet another major client for Attributor, which already added Time Warner to its video monitoring service in April, it also marks the addition of a new feature for Attributor, PDF detection.

Attributor’s service can now detect duplication in PDF files using a hybrid of its existing text matching service, which is used to detect likely matches, and a file matching system to detect copies. The system works even if the content was not originally uploaded in PDF format, meaning the person offering the unlicensed copies converted the file to PDF, and works with many different file sharing, sometimes referred to as “cyberlocker” sites.

This is a big announcement because it makes the Attributor system useful for a whole new range of new audiences. Where once it was targeted at publishers of text articles, such as blogs, news organizations, etc. it is now available to also help book publishers, such as Hachette, research firms and others that routinely publish to PDF format or have their content repurposed into PDF format.

Though Google and other search engines have been able to find matching text within PDFs, Attributor looks at the PDF file itself, similar to what it does with its image search, enabling it to go beyond simple text matching and look for other similarities. How effective this matching is remains to be seen, may be the subject of a future test for me, but it is an interesting feature and one that is likely to be compelling for for many publishers.

In the end, if you publish a lot of files to PDF format or work in a type of content that is often reformatted into PDFs, you may want to give Attributor a second look.

Attributor’s Brochure (PDF)

Disclosure: I have consulted with Attributor.

Short URL to this Post: http://copybyte.com/z/6

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • Hi Jonathan. We have been crawling, indexing and comparing PDF documents for some time. This is hardly new technology, unless of course, Attributor is using an OCR scan in real time to compare unlike PDF documents. Can you shed light on whether this is unique or simply a press release? I want to enlighten our development team if they have overlooked a new opportunity.
  • My understanding in talking with them was that they are using file detection technology as well, similar to what they do with their video search, looking at the raw data of the file. They're keeping the technology close to the vest on this one (I did not help in any way with the video or PDF products) but they claimed to use their text matching to get an idea of files to scan further and then use the file matching technology to find more exact matches.

    Does that help?
blog comments powered by Disqus