Research group OpenAI set the artificial intelligence community abuzz when it released a new paper on the latest version of its cutting-edge language generation system, GPT-3. The model was trained on a dataset more than 100 times larger than the already record-breaking amount of text that informed the previous version, GPT-2.
While OpenAI has yet to make the code behind GPT-3 publicly available as it did with GPT-2, the results of experiments outlined in the study show big improvements in the AI’s ability to generate realistic-sounding news articles and other text. The model was trained on nearly a trillion words in total, scraped from around the internet with a total of 175 billion parameters to GPT-2’s 1.5 billion. The whole training process cost about $12 million.
The announcement of GPT-2 last January led to a flurry of worrying headlines about its potential to be misused to create passable fake news or spam on a large scale. One agency even set up a fake blog created entirely by AI to demonstrate what that kind of abuse might look like.
Because of those fears, OpenAI opted to release the system itself in progressively bigger chunks, finally releasing the full-size version in the fall after researchers determined that the fake-news-pocalypse had not come to pass after all. Instead, GPT-2 spurred a host of creative projects, from an AI-based Dungeons and Dragons-style role-playing game with a cult following to a host of parody social media accounts and Adweek’s Super Bowl bot. Because of its unpredictability, it has yet to see much wide-scale commercial adoption, though various companies have begun to experiment with harnessing the tech for a chatbot.
Still, the latest GPT-3 paper once again points to threats such as “misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting.”
It’s not clear what the implications might be when and if GPT-3 is released, with a scale more than 100 times that of its predecessor. When its output was tested on a group of around 80 test subjects, fake news articles produced by the full-sized version were able to fool them about half the time. The system also performs highly on a number of standards used to assess the sophistication of natural language processing software, ranging from its ability to complete sentences to retrieving correct answers to basic questions.
Perhaps its most notable improvement is its versatility. Models such GPT-2 must be “fine-tuned” or trained on another smaller data set in order to nail a particular style, but Adweek’s ad concept generating version was trained on nearly 5,000 descriptions. Meanwhile, GPT-3 can master different imitations with only a few examples. Researchers found that the AI was able to write a passable poem in the style of Wallace Stevens having only been prompted with a a few paragraphs of his actual prose. In some cases, GPT-3 could outperform models that were fine-tuned after only having seen a single prompt.
This achievement is the latest breakthrough in a model of natural-language processing that has spurred a boom in research in the subfield called a transformer. Pioneered by Google in 2017, transformers allow for a wide array of language generation tasks through a base model that already understands the basic mechanics of language, thanks to hours of training on a massive training set. Some researchers think that transformers could lead to another AI boom around chatbots and text generation in the same way that a massive data set called ImageNet paved the way for the current AI boom in computer vision starting around 2012.
But researchers also seem to be pushing the outer limits of what these models can achieve in some ways with GPT-3. In one paragraph, they discuss a potential diminishing marginal returns on training with so much data. Like GPT-2, GPT-3 is prone to repetition, passages of meandering nonsense and the tendency to lose the thread of a subject over the course of longer outputs.
Researchers also took the notable step of testing various biases engrained into the system, perhaps influenced by a growing movement around algorithmic accountability. They found that, like many other AI systems trained on human-created data, GPT-3 was prone to reflect certain deep-seated societal ills. References to Black people had a consistently more negative sentiment than other races, and the model is more likely to use male identifiers, especially in the context of performing various occupations.
Blindspots like these, as well as the still very much unpredictable nature of the output will likely limit the commercial viability of even the most sophisticated of these language models for the time being. Greg Cross, co-founder of AI avatar startup Soul Machines, believes it will be at least a half decade before the technology reaches a point where it will be safe enough for businesses to adopt en masse. In the meantime though, the breakthroughs are already having a marked effect on creativity.