A cutting-edge machine learning system that generates fake news so convincing that its creators originally deemed it too dangerous to fully release to the public is now available for anyone to use.
The research organization OpenAI published the full code for the text-generation program, called GPT-2, this week after finding “no strong evidence” that the limited version it released in February—and the gradually more advanced ones it’s added in the months since—have been misused for the type of fake news or spam operations that had been feared at the time.
Trained on a massive dataset spanning text from more than 8 million websites, GPT-2 is able to create a coherent, realistic-sounding continuation of a text input as short as one or two sentences. Some researchers expect it and other similarly advanced models to have a transformative effect on the field of natural language AI research due to its ability to serve as a backbone for other text applications, like chatbots, autocomplete predictors and even creative writing aids.
Nevertheless, OpenAI continues to warn of the system’s potential for more nefarious uses, like large-scale propaganda. In a report released alongside the code, research partners at the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism demonstrated how GPT-2 might be used to create content in support of four ideologies: white supremacy, jihadist Islamism, Marxism and anarchy.
Some experts had also worried that the tech might be used to mass-generate spam blogs to game search engines. But Rowan Zellers, a researcher at the Allen Institute for Artificial Intelligence who co-created a tool to detect AI-generated fake news, said he hasn’t seen any evidence of this problem yet either.
“My thought is that right now, the controllability isn’t there yet for real world adversaries to use the technology,” he said in an email, “which is great because it means there’s more time to study it safely.”
It only takes a few minutes of idle experimentation with the bot—which you can do on this site—to see why potential abusers might find it lacking. Much of its output has a surreal emptiness, hitting all the right stylistic notes and syntax of the given source material but with nonsensical content. It sometimes forgets or confuses its initial premises over the course of meandering paragraphs.
Below, for instance, is one of the program’s suggestions for text to follow the first couple sentences of this article:
“While some of our work suggests that GPT-2 has indeed been misused, there is no clear indication that it has been used for anything nefarious,” wrote OpenAI researcher Elie Bursztein and a team of independent researchers. “Nonetheless, we consider it prudent to release the program as-is, as a matter of transparency, so that the research community may more fully evaluate it and its implications.”
The fabricated quote sounds almost real enough at a passing glance; the second sentence is even a fairly accurate summary of OpenAI’s motivations, and Elie Bursztein is indeed a computer engineer. But the first sentence is false, and Bursztein, who leads Google’s anti-abuse research team, does not work at OpenAI, nor has he said this.
At other times, though, the bot is eerily lucid. In another try, it generated this slightly clunky but not inaccurate sentence:
To help it do this, the program uses a combination of machine learning and natural language processing techniques, like text analytics. “Our algorithms are capable of generating all the syntactic structure of the original text in real time, and are then capable of generating a coherent sentence,” the OpenAI release says.
Yet another attempt took a meta turn as the bot tried to imitate itself:
For instance, the system’s outputs include phrases like: “This is a news article that will be posted soon. This is a real news article from CNN.” It was first tested on a limited number of fake news samples and is available for anyone to download and experiment with.
More abundant examples of its work can be found in a subreddit populated entirely by more than 100 GPT-2-powered bots that imitate real-life subreddits. Microsoft also released a version of GPT-2 trained on tens of millions of Reddit posts to power conversational chatbots this week.
While most of the applications channeling GPT-2 for any kind of practical or commercially viable use remain in their earliest experimental stages, OpenAI researchers expect more business cases for the technology to come as it improves the level of control.
“The diversity of GPT-2’s early applications gives us confidence that releasing larger model sizes will enable further benefits,” the researchers wrote in the paper. “Further improvements on models and interfaces will likely yield further scientific, creative and commercial applications.”