Twitter’s New Policy on Deepfakes Leans More Toward Labeling and Providing Context

Outright removal of synthetic and manipulated media is limited to the threat of harm

Starting March 5, Twitter will begin adding labels like this Twitter

Removing content containing synthetic and manipulated media, widely referred to as deepfakes, from Twitter altogether is still reserved for the most extreme cases, but people will soon start seeing more warning labels over blurred thumbnails and more explanation and clarification, the social network said Tuesday.

Twitter head of site integrity Yoel Roth and group product manager Ashita Achuthan revealed the social network’s new policies in a blog post, following the company’s release of potential rules last November and subsequent intake of public feedback.

The social network received over 6,500 responses from people worldwide, and it also consulted with civil society and academic experts.

More than 70% of respondents said taking no action on altered media was unacceptable, but warning labels placated nearly nine out of 10, and roughly the same percentage said warning people who were about to tweet altered media was acceptable.

However, just 55% were in favor of removing all altered media, with many bringing up free expression and censorship. That number rose to more than 90% in cases where it was clear that the content was intended to cause certain types of harm.

Over three-quarters of people who provided feedback were in favor of enforcement action against accounts that shared synthetic and manipulated media.

Roth and Achuthan provided extensive details about Twitter’s new policies.

They said that when considering whether media is synthetic or manipulated, factors involved in the decision include whether it was substantially edited to fundamentally alter its composition, sequence, timing or framing.

Twitter will also look for visual or auditory information that has been added or removed—such as new frames, overdubbed audio or modified subtitles—as well as whether it depicts a real person.

The most well-known deepfake is likely the doctored video of Speaker of the House Nancy Pelosi (D-Calif.) that went viral last May, in which her speech was slowed down to make her appear intoxicated.

Roth said during a press call that the Pelosi video would qualify for labeling under Twitter’s new policy, adding, “Since the video is significantly and deceptively altered, we would label it under this policy. Depending on what the tweet sharing that video says, we might choose to remove specific tweets.”

The social network’s next step is to determine whether the media was shared in a deceptive manner, saying as an example that falsely claiming the content depicts reality could result in confusion or misunderstanding and indicate a deliberate intent to deceive.

Roth and Achuthan said Twitter will assess context including text and metadata included with the media, as well as profile information and linked website in the profile of the person sharing the media or the specific tweet.

During the press call, Twitter vice president of trust and safety Del Harvey stressed that the focus was on the media itself, regardless of who was responsible for sharing it, adding, “Our goal is really to provide people with more context around certain types of media they come across on Twitter and to ensure that they’re able to make informed decisions around what they’re seeing.”

For the content to be removed altogether, Twitter must determine that it is likely to cause harm, offering as examples: threats to the physical safety of a person or group; risk of mass violence or widespread civil unrest; and threats to the privacy or ability of a person or group to freely express themselves and participate in civic events, such as stalking, unwanted or obsessive attention, targeted content with tropes or epithets or content aimed at silencing or intimidating someone, as well as voter suppression.


Now, on to the action: Starting March 5, Twitter will begin adding labels to tweets containing content that violates the policies above, as well as showing warning labels to people about to like or retweet those tweets.

The visibility of those tweets will be limited to prevent them from being recommended, and additional explanations and clarifications will be provided, where applicable, such as directing people to landing pages with more context.

Roth said during the press call that selective editing would be covered under the new policy, admitting that one of the biggest challenges will be determining whether or not a piece of content is satire, He added, “We need to try and get as much context as we can about the interactions on Twitter and, a lot of times, we’re sort of an outside party to a conversation that’s happening on our service.”

Roth and Achuthan concluded in the blog post, “This will be a challenge, and we will make errors along the way—we appreciate the patience. However, we’re committed to doing this right. Updating our rules in public and with democratic participation will continue to be core to our approach. We’re working to serve the public conversation, and doing our work openly and with the people who use our service.” David Cohen is editor of Adweek's Social Pro Daily.