Attributor Launches PDF Support

attributor-logo

Hachette Book Group, a New York-based book publisher, announced a new partnership (doc file) with content tracking service Attributor that will allow the company to track uses of its content on the Web.

Though this is yet another major client for Attributor, which already added Time Warner to its video monitoring service in April, it also marks the addition of a new feature for Attributor, PDF detection.

Attributor’s service can now detect duplication in PDF files using a hybrid of its existing text matching service, which is used to detect likely matches, and a file matching system to detect copies. The system works even if the content was not originally uploaded in PDF format, meaning the person offering the unlicensed copies converted the file to PDF, and works with many different file sharing, sometimes referred to as “cyberlocker” sites.

This is a big announcement because it makes the Attributor system useful for a whole new range of new audiences. Where once it was targeted at publishers of text articles, such as blogs, news organizations, etc. it is now available to also help book publishers, such as Hachette, research firms and others that routinely publish to PDF format or have their content repurposed into PDF format.

Though Google and other search engines have been able to find matching text within PDFs, Attributor looks at the PDF file itself, similar to what it does with its image search, enabling it to go beyond simple text matching and look for other similarities. How effective this matching is remains to be seen, may be the subject of a future test for me, but it is an interesting feature and one that is likely to be compelling for for many publishers.

In the end, if you publish a lot of files to PDF format or work in a type of content that is often reformatted into PDFs, you may want to give Attributor a second look.

Attributor’s Brochure (PDF)

Disclosure: I have consulted with Attributor.

If you enjoyed this post, please consider sharing it with your friends. Also, you can subscribe to the RSS feed or sign up for our email newsletter below:
Join The Plagiarism Today Mailing List
Comments have been disabled for this post.
Sort: Newest | Oldest

My understanding in talking with them was that they are using file detection technology as well, similar to what they do with their video search, looking at the raw data of the file. They're keeping the technology close to the vest on this one (I did not help in any way with the video or PDF products) but they claimed to use their text matching to get an idea of files to scan further and then use the file matching technology to find more exact matches.Does that help?

Hi Jonathan. We have been crawling, indexing and comparing PDF documents for some time. This is hardly new technology, unless of course, Attributor is using an OCR scan in real time to compare unlike PDF documents. Can you shed light on whether this is unique or simply a press release? I want to enlighten our development team if they have overlooked a new opportunity.

My understanding in talking with them was that they are using file detection technology as well, similar to what they do with their video search, looking at the raw data of the file. They're keeping the technology close to the vest on this one (I did not help in any way with the video or PDF products) but they claimed to use their text matching to get an idea of files to scan further and then use the file matching technology to find more exact matches.

Does that help?

My understanding in talking with them was that they are using file detection technology as well, similar to what they do with their video search, looking at the raw data of the file. They're keeping the technology close to the vest on this one (I did not help in any way with the video or PDF products) but they claimed to use their text matching to get an idea of files to scan further and then use the file matching technology to find more exact matches.

Does that help?

My understanding in talking with them was that they are using file detection technology as well, similar to what they do with their video search, looking at the raw data of the file. They're keeping the technology close to the vest on this one (I did not help in any way with the video or PDF products) but they claimed to use their text matching to get an idea of files to scan further and then use the file matching technology to find more exact matches.

Does that help?

Hi Jonathan. We have been crawling, indexing and comparing PDF documents for some time. This is hardly new technology, unless of course, Attributor is using an OCR scan in real time to compare unlike PDF documents. Can you shed light on whether this is unique or simply a press release? I want to enlighten our development team if they have overlooked a new opportunity.