Experimental New AI Can Autocomplete Images With the Same Technology as Predictive Text

Research marks a significant step in understanding how machines process visuals and language

Image GPT guesses the bottom halves of images like movie posters, memes and photos. OpenAI

Just as the artificial intelligence in your iPhone can guess the next word you might type in a message, the same technology can predict the bottom half of an image simply from scanning the top half.

That was the finding of a new experiment from research group OpenAI, which trained a version of its hyper-sophisticated text generator, GPT-2, on millions of images to prove that it could generate coherent patterns of pixels in the same manner that it does sentences.

Researchers demonstrated the results by feeding the system the top halves of images like movie posters, sports photos and popular memes. The system was then able to generate a bottom half based on the patterns of pixels it saw. While the results didn’t always match the original image, the output was usually photorealistic and fit seamlessly with the rest of the image.

ImageGPT's various autocomplete generations based on the top half of a real image on the far right.

The research could help scientists better understand the parallels between how machines understand and generate language versus visual data. While a model of machine learning called a transformer has spurred a new research boom in the subfield of natural language processing AI in the past couple of years, the method has not proven as successful with tasks like image classification and generation, the researchers write in the introduction to their paper.

“Our work aims to understand and bridge this gap,” they said.


AI image generation technology has also made strides in recent years—even spawning a flourishing experimental art scene—but the model of machine learning that has provided the foundation for that research is fundamentally different from that which OpenAI uses in this paper. Most image-producing AI relies on a model called a Generative Adversarial Network (GAN) that can produce varied imitations of a style of image once trained on a large dataset of similar visuals.

OpenAI’s model, dubbed Image GPT, is based instead on a version of GPT-2, which can generate realistic-sounding copy based on a text prompt. OpenAI made waves when it announced GPT-2 early last year and declined to release the code all at once for fear that it would supercharge fake news and spam mass production. Instead, the model has been more often used for novelty applications, like a text-based adventure game, parody Twitter accounts and Adweek’s own Super Bowl Bot.

OpenAI recently released an even larger form of that program called GPT-3, trained on a dataset more than 100 times bigger than the previous iteration, and plans to use it as the backbone of its first commercial product.

@patrickkulp patrick.kulp@adweek.com Patrick Kulp is an emerging tech reporter at Adweek.