Report: How Social Spam Distorts Data Insights

Marketers are constantly looking for new insights from social data, but social media accounts overrun with spam can result in "dirty data."

Social media is riddled with spam. Up-and-coming networks attract spam as they grow quickly, and older networks have to deal with ever more sophisticated bots. A new report from Networked Insights examines how spam and bots distort the insights brands try to gain from social media.

According to the report, nine percent of all users tweeting in English are non-consumers, and these accounts represented 15 percent of all tweets. Networked Insights defines non-consumers as “social bots, celebrities, brand handles and inactive accounts.”

As a result of this non-consumer content, much of the social data collected by social data scientists is ‘dirty.’ The New York Times reports that data scientists spend 50 to 80 percent of their time just cleaning up data before it can be analyzed. Weeding out the spam and other false data points slows down the process and makes it harder to gain real insights from data sets.

Social spam is defined by Networked Insights as coupon postings, product listings, contests and giveaways, which combined, make up nearly six percent of social posts. Adult content makes up less than three percent of posts, and general spam such as gibberish makes up a little more than one percent.

Different networks have varying levels of social spam. Nearly 30 percent percent of forum posts are social spam, nearly 20 percent of blogs and comments are spam, and more than nine percent of tweets are social spam.

chartMany brands are overrun by this spam. 95 percent of the conversation around Rite Aid and Elizabeth Arden, 81 percent of the conversation around Visa is social spam. This kind of negative atmosphere could erode trust in these brands.

Very little of this social spam comes from real consumers. 53 percent of the content is generated by social bots, 23 percent comes from verified and brand accounts, and 11 percent comes from accounts that have been suspended, cancelled or disabled by Twitter.

Networked Insights used the food and beverage vertical to analyze the effect of removing spam from the conversation. The clustered data before spam removal showed large focuses on beer, pizza, coffee, cake, and adult content. After all the spam was removed from the conversation — 14 percent of all posts — nuanced conversations began to emerge.

This more nuanced conversation included topics such as vegan eating and ethnic fast food. The implication here is that relying on a dirty data set could in inaccurate audience targeting, and  misinterpreting what their audience really cares about.

Dirty data could also impact things like industry benchmarking.  For instance, it could be hard to compare two brands operating in the same vertical, but have vast disparity between the amount of spam they receive.

Networked Insights suggests removing spam from your data sets before trying to analyze what consumers are talking about. By doing so, your brand will have a clear understand of your customers’ interests, and the granular conversations could present new opportunities for your business.

Image courtesy of Shutterstock.