Watermarking vs. Fingerprinting: A War in Terminology

The terminology when dealing with copyright law is notoriously thick. Throw in technology and you have a kludge of confusing terms, misnomers and techno-speak.

Navigating these waters can be tough and, as the latest episode of the Copyright 2.0 Show indicated, even the brightest in the field can be tripped up.

However, in honor of that confusion, I’ve decided to take a close look at two of the terms that have been giving headaches to many in the field, myself included, digital watermarking and digital fingerprinting.

Though the two terms sound very much the same and serve the same end goal many times, they are actually very different in the way that they operate and the protections they provide.

Confusing the two terms is very easy, for obvious reasons, but distinguishing between them is very important.

Watermarking

If Wikipedia is to be believed, then digital watermarking “is a technique which allows an individual to add hidden copyright notices or other verification messages to digital audio, video, or image signals and documents.”

The important element to remember is that, when watermarking something, you are adding information to an existing product. That information could be visible, such as those provided by Visual Watermark, or invisible, such as those provided by Digimarc.

Either way, the effect is the same. You are taking an existing work and embedding new information into it in order to make the work more easily identified as yours. This identification can be done by a third party, by you or by an automated tool.

It is that act of altering the content that is crucial to remember because, as you will see, it distinguishes itself from how fingerprinting works.

Fingerprinting

Fingerprinting has at least two definitions when it comes to protecting content. The first deals with taking each copy of your content and making it unique to the person who receives it. This way, if the work is shared, you know exactly which person spread the work initially.

A variation of this technique is used by the CopyFeed plugin, which embeds the IP address of the feed reader into every entry. Thus, if the feed is scraped and reposted, the person doing the scraping can be identified and blocked.

However, the more common definition deals with a technique, as Wikipedia puts it, “in which sophisticated software identifies, extracts and then compresses characteristic components… enabling that video [or other content] to be immediately and uniquely identified by its resultant ‘fingerprint'”.

In short, what fingerprinting usually does is take the content, use some kind of software to convert it into a unique number or string of characters and then use that string to match it against other content out there.

It basically does to the content what a fingerprint scanner does to your fingerprint. The principles are very much the same.

The simplest and best-known form of fingerprinting is SHA Hashing. This is how downloaders verify they received the complete file and how Numly matches content against its database.

However, hashing can be thwarted easily by the alteration of just one character or byte. Since that changes the hash for the entire content, it will not return a match even if everything else is the same.

More advanced fingerprinting technology, such as what is employed by major media corporations and is under construction for YouTube via the “Claim Your Content” site are much more accurate and can detect snippets of material as short as a few seconds. These fingerprints are available for video, audio, image and even textual works. However, they are less commonly used with text due to other readily-available search methods.

In short, digital fingerprinting is something that you do to an original work that does not modify it, but makes it easily searchable by compressing the content into a string which is both unique to it and can easily be compared against other works.

If you’re confused, well, things are about to get even worse.

Exceptions to the Rule

Right now, the rule seems simple enough. Watermarking is anything you actively embed into your content to aid in detection or identification, fingerprinting is something you do to your work to make it easily searchable and comparable to other works by using some element of programming.

However, watermarking is a strange term in that it is usually applied only to multimedia works. You can watermark a movie, an image, an audio file or even a downloadable document, but the term is almost never used to describe something done to clear text.

For example, the Digital Fingerprint Plugin actually works more like a watermarking plugin. It embeds a unique string into the RSS feed, one selected by the user, to make the work more easily identifiable. It fits the description of a watermark almost perfectly but calling it the “Digital Watermarking” plugin would have created a great deal of confusion.

The reason is that watermaking is usually associated with some kind of visual or audible change in the work. It might be invisible to the naked eye, but it is supposed to be a modification to the look or sound of the work. At the very least, it is a modification to a file, such as a Word document. For whatever reason, embedding a new line of text or adding an signature is called by many things, but not watermarking.

The term fingerprinting has been expanded to encompass just about anything done to text to make it uniquely identifiable including traditional fingerprinting and some of what we might call watermarking. This has been going on since long before the Digital Fingerprint Plugin and is likely to carry on deep into the future.

That is, unless linguists suddently decide it is possible to “watermark” a textual work.

Conclusions

When you look at these two terms and how they overlap and intertwine, it is easy to see how and why there is so much confusion regarding them. Even this lengthy article is not by any means a final report on the differences between the two and I’m already looking forward to the discussion in the comments.

But if you need a simple way to remember the two, try this. If I needed to prove that your thumb belonged to you, I could either A) Tattoo a unique symbol onto it, that would be a form of watermarking, or B) I could put your thumb up against a scanner and let analyze the lines and curves of your thumb, making sure it was unique to you. That would be, quite predictably, fingerprinting.

Though that analogy isn’t perfect by any stretch, at least it brings some clarity to the table.

Now, if we could only deal with the dozens of other misnomers and confusing names, we might actually be able to have a serious conversation on these topics.

Perhaps another day.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free