Poisoning the AI Well

Jonathan BaileyOctober 25, 2023

5 minutes read

Earlier this week, researchers from the University of Chicago offered a preview of an upcoming paper that they have submitted for peer review.

First reported by Melissa Heikkilä at MIT Technology Review, the researchers claim that they have found a way to “poison” image AI training data, causing models to become confused and generate incorrect or unrealistic images.

The new system is named Nightshade and works by altering the pixels in an image in such a way that humans can’t detect the changes, but an AI system seeking new training information, could easily become confused.

In their testing, they fed Stable Diffusion, a popular image generating AI, some 50 poisoned images pertaining to dogs. They then queried it to produce images of dogs and noticed that the generated images looked wrong, some with too many limbs or inaccurate faces.

When they increased to 300 poisoned images, Stable Diffusion began to generate images of dogs that looked like cats and other animals.

This is actually the second anti-AI tool that this team has created. In February, they announced the launch of Glaze, a similar AI poisoning tool that worked to “mask” the style of the artist. The team now plans to integrate Nightshade into Glaze to create a single tool that allows artists and photographers to protect their work.

According to the researchers, the tools work due to a security vulnerability that is common across all generative AI models. It’s unclear if and how AI companies will respond to this.

However, if this sounds familiar, it’s because it isn’t the first proposal to poison images for AI use. In fact, the broader idea of protecting images by hiding information in its content goes back at least 15 years.

Nightshade vs. Previous Poisoning Attacks

Back in March, researchers from the Massachusetts Institute of Technology announced a new program they developed entitled Photoguard.

The idea of Photoguard was very similar to the idea behind Nightshade and Glaze, it adds “noise” to the image that makes it very difficult for AI systems to parse the content.

Though very similar in approach, Photoguard had a different end goal in mind. Rather than poisoning the AI well, it was an attempt to make the image unusable for deepfakes or more targeted attacks.

The system’s main drawbacks, other than being a static defense to an adaptive attack, was that it didn’t scale. The process of using Photoguard was very computer-intensive and, though not much of a barrier for one or a few dozen images, protecting thousands or millions of images became a problem.

That appears to also be a problem for Nightshade. According to the paper, it takes 94 seconds to generate a poison image on a high-end AI graphics card. This means that, on average, protecting 100 images would take over two and a half hours of computational time on a good system.

But that may not be very important because of the different approach Nightshade takes. It isn’t about protecting a specific image, but rather, poisoning the AI training well.

As their study suggests, there’s a notable degradation in generated image quality after just 50 poisoned images, with 300 images completely altering the targeted terms. Even more interesting, the poisoning bled through to related terms other than the ones the researchers were targeting. So poisoned “Dog” images might also impact queries for “Puppy” or “Husky”.

The researchers did note that AI companies may be able to find ways to detect and either remove the watermark or exclude the files, but such a system would have to be remarkably efficient to process millions of images and detect/address the poisoned images.

In short, it’s an interesting approach that points to a major weakness in generative AI, one that they appear to have found a way to exploit.

Machine (Un)Readable Watermarks

The idea of an image watermark meant for machines is not a new one. Back in August 2010, we took a look at SignMyImage, a company that placed invisible watermarks over images so that they could be more easily detected.

By that time, companies like Digimarc had been using similar systems for years, helping professional photographers and photo agencies find copies of their image online.

This type of watermarking is still broadly used today. However, it is primarily used to track licensing and validate image integrity, not simply find images. That’s because image detection largely moved to fingerprinting, which didn’t require a watermark, to find duplicate images.

But that points to what has changed with AI. Previously, invisible image watermarking was about making images easier to locate and give machines new information. Now, the goal of systems like Photoguard, Glaze and Nightshade are to make it more difficult for the machines to read the image at all or to intentionally mislead it.

So, while we’re still, hiding information for machines to find, that relationship has gone from cooperative to adversarial in just a few short years.

Bottom Line

Historically, watermarking solutions, regardless of the problem they intend to solve, have all struggled in the same way: Reaching critical mass.

Most people, including professional photographers and artists, don’t take any action to protect their images before uploading them. Getting enough people to participate to have an impact has always been a challenge.

To that end, Nightshade still has some of that as an issue. Even as AI has made many artists reflect on how their work is used, few still take active steps to protect their work. Furthermore, the long render time and indirect form of protection may hinder broad use of it.

That said, it may not matter. The study seems to indicate that the critical mass for this approach may be extraordinarily low. Just a few thousands or few tens of thousands of poisoned images could have broad impacts on generative AI systems.

Essentially, Nightshade is fighting an asymmetric battle. Yes, it will take a fair amount of time and resources to poison even a handful of images, but it would take far more to check billions of images for poisoning or to correct the images.

Even if companies do find an easy way to detect Nightshade, it would still serve as a very clear “opt out”, and it could raise serious legal questions about using the image.

This is especially true if Nightshade’s watermark is considered copyright management information and the AI removes that information. That would likely be a violation of the Digital Millennium Copyright Act (DMCA) and an infringement unto itself.

In the end, what makes Nightshade interesting isn’t the concept of invisible watermarks, but how it exposes a vulnerability in how the AIs are trained. It still has the same problems as other systems, but its vector of attack means that those problems may not matter as much.

Essentially, it opens up a second front in the war against AI. While the legal war will continue to rage for some time to come, this opens up a potential battle line on the tech side, giving creators a new weapon to fight back with.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free