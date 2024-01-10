Discover exciting ways to capitalize on the power and passion of the sports community at Adweek’s Sports Marketing Summit . Register today to join us in NYC or virtually on May 9.

OpenAI’s response to The New York Times’s lawsuit reveals several key points of tension between the news industry and gen AI firms and their backers.

In particular, OpenAI’s response shows the limited, albeit critical efforts the company has made to collaborate with publishers; the challenges of focusing too heavily on regurgitation; the disputed limits of fair use, and the false logic of an opt-out option, according to interviews with publishing executives and gen AI experts

“This litigation is not going to be resolved any time soon, but when it does it will shape the landscape and future of publishing,” said media analyst Matthew Scott Goldstein. “The big question is whether other publishers should join, sue individually or wait out this case.”

Limited, nascent collaboration

In its statement, OpenAI emphasizes how it collaborates with news organizations like the American Journalism Project and News Media Alliance.

News outlets have welcomed these conversations as they represent a healthier approach than the alternative. But these dialogues have been limited, nascent and, at least in the case of The New York Times, unappealing.

For instance, despite training on publisher data for years, OpenAI has struck only two partnerships (the AP and Axel Springer). This limited sample size runs counter to its claims of supporting the broader news ecosystem, according to Neil Katz, the founder of EyeLevel.AI.

“They’re helping news organizations? Who and how?” Katz said.

Smaller publishers, particularly ones with less bargaining power and political clout, also deserve to be compensated for their data, said Raptive chief executive Michael Sanchez. So far, no efforts to negotiate with independent creators have materialized.

And for many organizations, like The Times, the licensing offers have proven uninspiring. OpenAI has reportedly floated sums of between $1 million and $5 million to some publishers in exchange for the right to train on their data.

While these payments are unlikely to cover the production cost of most news publishers’ content, they could still prove alluring to cash-strapped media outlets, said media analyst Mauricio Cabrera.

“Rather than engaging in protracted battles that they might not win, media outlets may opt for less-than-perfect agreements that provide them with much-needed financial support,” Cabrera said.

Regurgitation is a red herring

Both The Times’ lawsuit and the response from OpenAI addressed the issue of memorization and regurgitation, which occur when a large-language model (LLM) generates a response that replicates its source material verbatim.

When OpenAI says regurgitation is a bug it hopes to fix, that doesn’t mean it will clean up its sources. It means it will hide them better. Neil Katz, founder of EyeLevel.AI.

The Times spotlighted the issue because it most clearly resembles plagiarism and offers clear proof that its reporting has been ingested by an LLM, according to the lawsuit. In response, OpenAI claimed these instances are rare errors.

But the issue, while potentially helpful for grounding a legal argument, is ultimately a distraction from the primary debate over fair use, sources said. In vowing to eliminate regurgitation, OpenAI means only that it will improve how ChatGPT presents information, not how it ingests it.

“When OpenAI says regurgitation is a bug it hopes to fix, that doesn’t mean it will clean up its sources,” Katz said. “It means it will hide them better.”

The fallacy of opting out

OpenAI also claims that publishers can shield their data from its LLMs by choosing to opt out, which The Times itself did in August 2023.

But the assertion is misleading, said Sanchez. Even if a publisher opts out now, LLMs have trained on its data for years, long before media companies had such an option.

Further, some LLMs, such as Google’s Search Generative Experience, use the same technology to crawl websites as they do to index them, said Caswell. So if a publisher opts out of SGE, it risks jeopardizing its search visibility.

Media companies face a similar challenge with Microsoft’s Copilot product: If they opt out, they will not surface in search results.

“An opt-out after the model has been trained is no opt-out at all,” Sanchez said.

The contested limits of fair use

Nearly every legal case against LLMs centers on the concept of fair use, which is integral to copyright law.

In its statement, OpenAI claims that training is fair use, but The Times and others contest that claim.

At its core, LLMs do exactly what humans do, reading and learning from publicly available content, said David Caswell, founder of the strategic gen AI firm Storyflow.

Ultimately, legal experts are split on the issue, meaning it will likely be decided in a court of law.