Why The New York Times AI Case is Different

Jonathan BaileyJanuary 2, 2024

5 minutes read

Normally, my first post of the year is both a look back at the year that was and ahead to the year that is coming.

However, this year, you can simply reread last year’s post, where I talked about how AI was both the biggest copyright story and the biggest plagiarism story of 2022 and would be in 2023.

Simply change the dates to 2023 and 2024 respectively and the story works roughly as the same.

Proof of this is in the fact that 2023 saw a significant Supreme Court decision. Namely, a ruling in favor of photographer Lynn Goldsmith in her case against the Andy Warhol estate.

The case will have major implications for fair use moving forward. However, many, including myself, immediately framed the case in the context of AI. Even a game-changing Supreme Court copyright decision, something that happens once a decade or less, didn’t take attention away from AI.

However, possibly the most important copyright case against AI wasn’t filed until the last week of the year. That was when the New York Times filed a lawsuit against both OpenAI and Microsoft, alleging that they engaged in copyright infringement by training their AI systems on Times content without a license.

In a year where creators filed dozens of lawsuits against AI companies, The New York Times case is unique. Simply put, the Times makes arguments that we have not seen used against AI companies and, by most accounts, has a much stronger chance of success than other lawsuits filed by authors, lawsuits, comedians and so forth.

That’s because the New York Times case frames AI in a very different light. It focuses less on the technology itself and more on the nuts and bolts of copyright. That, in turn, makes it a very dangerous case for AI companies, and one that could be particularly damaging for OpenAI.

The Pattern So Far

To understand why the New York Times case is different, we first must look at the cases that led up to it.

2023 was nothing if not a parade of copyright infringement lawsuits filed against AI companies. We’ve seen lawsuits filed by photographers, artists, authors, comedians, music publishers and more. However, despite the different litigants, the cases largely make the same arguments.

That AI Companies Used Their Work for Training
That AI Output Was an Unlicensed Derivative of Their Work

This has led to largely the same two counterarguments to those allegations. According to AI companies:

The Use of Internet Content to Train an AI is a Fair Use
That the Output is Not Similar Enough to Be Infringing

In the case involving Sarah Silverman, a judge dismissed the second argument, saying that they hadn’t proved outputs had infringed their work. Though the plaintiffs refiled that claim, it was a warning sign that success on that claim was going to be difficult for plaintiffs.

However, oddly, it’s that second claim where the New York Times case is the strongest. As The New York Times makes a simple, but elegant case that ChatGPT’s output goes well beyond restating facts and information and engages in verbatim copying.

Artificially Unintelligent Verbatim Copying

One of the things that makes The New York Times lawsuit different from those that came before is that it highlights multiple occasions where ChatGPT didn’t just copy information or ideas from an article but copied the article wholesale in its response.

One example focused on a 2012 New York Times series about how technology companies had changed the global economy. With what the lawsuit claims was “minimal prompting” ChatGPT regurgitated nine paragraphs of text verbatim.

Another piece, a 2019 one about the New York taxi industry, saw similar copying with only minimal rewriting.

These were just two examples in the filing, with several other articles being similarly copied.

What’s clear in this case is that ChatGPT is not producing original work. The New York Times makes a strong argument that ChatGPT’s output is infringing when it’s given certain prompts.

To what extent that does or doesn’t make OpenAI liable is for the courts to decide. However, there is not much doubting that The New York Times has established that it does copy their work and does so in a potentially infringing way.

That, as we’ve seen, is a hurdle that other cases have struggled with that makes this case already significantly stronger. However, it’s not the only reason The New York Times is in a much better position.

Fair Use and Fair Markets

The New York Times also directly targets potential fair use claims with a very simple argument: That there is a market not only for licensing New York Times content, but licensing news content for AI training.

With the former, they point to their ongoing relationship with the Copyright Clearance Center. There, users pay “several thousand dollars” to host a New York Times article for up to a year on a commercial website.

However, the more damming argument that The New York Times makes is highlighting OpenAI’s own actions.

In July, OpenAI signed a deal with the Associated Press to license the use of the AP’s news stories. However, the New York Times also claims that that OpenAI approached them about a similar license. The two sides were not able to reach an agreement though, according to The Times, OpenAI continues to use their content.

This severely damages any fair use case that OpenAI may have. As we discussed back in May, the arguments that AI companies make largely hinge on whether their use is transformative. However, after the Goldsmith ruling, much of the emphasis was taken off transformativeness and spread to the other factors, including the harm to the original work’s market.

The New York Times has not only established that there is an active market for their work, but that there is an active market for licensing news content for training AI services. This is a market that OpenAI is participating in.

By licensing some creators works and not licensing the works of direct analogs, though still using them, OpenAI may have hurt its fair use case. Judges and juries are going to ask, “If it’s a clear fair use? Why did they pay for a license?” Furthermore, this bolsters the plaintiff’s arguments that using the work in an unlicensed way is harming the market for their work.

This puts The New York Times in an incredibly strong position, and one that should have OpenAI, Microsoft and other AI companies concerned.

Bottom Line

To be clear, this article only looks at two aspects of the case. Interestingly, The New York Times also alleges trademark dilution because ChatGPT often cites things to the paper that it did not say or do.

The company also claims that OpenAI removed copyright management information (CMI), in direct violation of the Digital Millennium Copyright Act (DMCA). In total, The New York Times is seeking both unspecified damages and the destruction of large language models trained on their material without permission.

All that said, the New York Times still doesn’t have a guaranteed victory nor are the plaintiffs in other cases facing certain defeat. Courts are inherently unpredictable and many of the legal arguments in all AI-related cases are untested. This is going to be an important year for copyright-related rulings involving AI systems.

However, there isn’t much doubt that The New York Times is in a uniquely strong position. While that could result in legal victory, it could also motivate OpenAI and Microsoft to seek a settlement.

After all, it behooves AI companies to litigate weaker cases in an attempt to establish a more favorable precedent. Given the number of cases that they face, they certainly have plenty to choose from.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free