Microsoft Warns Its Advanced Chatbot Might Say Offensive Things

Software giant releases blueprint for DialoGPT, its Reddit-trained AI

Purple blueprint robot with neon grid chat boxes coming out of its mouth
The GPT-2 was trained on hundreds of millions of Reddit posts created between 2015 and 2017.
Getty Images

Microsoft researchers have released a chatbot version of a cutting-edge text generator trained on tens of millions of Reddit posts—albeit with a disclaimer in place should things get offensive.

The open-source blueprint, DialoGPT, builds on a breakthrough in language-based artificial intelligence called GPT-2, another separate program released earlier this year that can generate random copy with unprecedented realism and serve as a base foundation for more tailored programs like Microsoft’s chatbot.

As one of the early attempts to channel the GPT-2’s unpredictable tech into a chatbot, the Microsoft project includes a precautionary measure that requires developers to write their own code for translating output data into readable text.

“The conversational text might be different from any large text corpus that the previous works have been using, in that it is less formal, sometimes trollish and generally much more noisy,” the researchers wrote in an accompanying paper. “Responses generated using this model may exhibit a propensity to express agreement with propositions that are unethical, biased or offensive—or the reverse, disagreeing with otherwise ethical statements.”

Despite the wildcard potential, some researchers think models like GPT-2 could supercharge advances in the type of machine learning that can understand and produce natural language in much the same way that analogous models for image recognition set the scene for an ongoing boom in computer vision AI around 2012.

Much like GPT-2 can serve as a backbone for countless more specifically trained text-generation tools, Microsoft invites developers to use DialoGPT as a base for training even further fine-tuned conversational programs with more tailored datasets.

“The package consists of a distributed training pipeline and several pre-trained models that can be fine-tuned to obtain a conversation model on a moderately sized customized dataset in few hours,” the researchers write.

Most developers have thus far waded into the artificial text future with caution, however. Research org OpenAI, the creator of GPT-2, also initially declined to release the full version of the software out of fear that it would be used for mass-producing fake news. (OpenAI, which is backed in part by a $1 billion investment from Microsoft, finally relented this week.)

Microsoft was also chastened by its Tay Twitter bot debacle in 2016, in which its AI made headlines for spewing racist and otherwise offensive tweets in response to human user interactions.

This time around, the Microsoft researchers took steps to censor swear words and derogatory slurs, as well as avoid subreddits for inappropriate or offensive topics, as they trained DialoGPT on more than 147 million Reddit posts and replies from between 2015 and 2017. Even so, the team couldn’t guarantee a clean enough output to release the full code.

What the researchers can reasonably promise is a colloquial back-and-forth comparable to human responses in a single-turn conversation Turing test, the paper claims. The researchers tested it on everything from open-ended philosophical queries—”How much freedom should people have?”—to common-sense questions—”Which is bigger, the sun or the moon?”

“The right amount of freedom is the freedom to do as you please, as long as you don’t hurt people or property,” the bot says in response to the freedom question.

When asked to explain “the meaning of a good life,” the chatbot said, “I think it’s that we’re all connected to our past lives, and the meaning of life is to live the way you want to and how you want to.”

Microsoft’s researchers aren’t the first to tap GPT-2 for conversational uses. Various developers have also created a world of subcommunities within Reddit populated entirely by GPT-2 simulators that imitate real-life subreddits. But there is still much work to be done before these programs reach any reasonable expectation of brand safety.

“In the future, we will investigate how to detect and control toxic generation,” the researchers write in conclusion, “and prevent the model from generating egregious responses.”

Recommended articles