Google’s release of large language model Gemini this week has publishers troubled about its potential to further impact revenue dynamics in an industry already grappling with volatility.

In a press call with reporters, questions were raised about datasets used to train Gemini, with Google executives declining to share specific details, including whether any were licensed from third parties, according to TechCrunch. But that’s just one of several questions industry execs have.

“I’m not aware of any publishers being approached about using their content for this,” said Danielle Coffey, president and CEO of nonprofit trade association News/Media Alliance. “I’m curious whether [Google] will allow news publishers the ability to monetize their content through traffic, which in reality is crumbs anyway because of its ad-tech tax, or if Google will continue to take all the revenue.”

Already starved news publishers have argued over the past year about the impact of generative artificial intelligence tools on search, with concerns mounting that these tools could affect up to 66% of their traffic and a huge chunk of revenue.

Identifying the exact data fed into AI models is nearly impossible, given that many entities have not publicly disclosed the specifics of their data sources. However, in order to effectively train and power these products, tech companies need precise, contemporaneous data—which news publishers are uniquely positioned to produce. In October, News/Media Alliance published research indicating that developers prioritize articles over generic online content for training AI tools.

“If Bard answers all questions with perfect efficiency, in an extreme scenario, it could put publishers completely out of business,” said Myles Younger, head of innovation and insights at digital education provider U of Digital. “Ultimately, this would make the web much less useful, and that would hurt Google.”

Publishers’ dilemma

Publishers raised concerns after Google CEO Sundar Pichai announced plans during the press call to integrate Gemini into Google’s search engine, ad products and Chrome browser sometime next year.

“If that’s the case, I don’t foresee publishers opting out,” said Coffey. “There’s not really a choice whether to be included in this new product.”

If you block the crawlers, then you risk the possibility of blinding Google to your content, and then no one sees it in the first place. Damned if you do, damned if you don’t. Publisher exec

In September, Google let publishers opt out of having their data used to train Google’s AI models such as Bard

Publishers are also vexed over the level of attribution that AI tools provide to the sources of information, often derived from expensive journalistic efforts. Complicating matters is the legal concept of “fair use,” which permits the use of copyrighted content without explicit permission.

But what makes Google a bigger threat, compared with rivals such as OpenAI, is that the tech giant is a primary gateway to the internet, making its application uniquely concerning, said a content strategy leader at a major publisher, speaking anonymously.

“As Google roadmaps products such as Bard and SGE [search gen AI] to play an increasingly central role in general search and content discovery, a publisher’s SEO [search engine optimization] efforts will increasingly work against them,” the exec said. “If you block the crawlers, then you risk the possibility of blinding Google to your content, and then no one sees it in the first place. Damned if you do, damned if you don’t.”

An increase in official deals

The industry expects a notable increase in official agreements between Google and publishers, similar to that of the Associated Press and OpenAI, especially in scenarios with high-quality data, such as that held by news publishers, said Katie Gardner, partner at law firm Gunderson Dettmer.

Data licensing agreements between content providers and content distributors hinge on several parameters, such as the value and timeliness of the data and the potential revenue offered to publishers.

“The data owner will want to maintain as much control as they can and want to preserve optionality in how to monetize that data, whether it’s licensing to other foundational models, or building their internal models,” Gardner said. “Ultimately, it just comes down to the dollars.”