Sentiment Analysis: When Machines Can Beat Humans

Guest blogger Dr. Taras Zagibalov of Brandwatch addresses criticisms leveled at automatic sentiment analysis used for social media monitoring. Humans, he posits, often struggle to determine the sentiment of a piece of text because, just like the machine, they do not have the relevant knowledge available. Have current developments in machine analysis improved the process when compared with human analysis? And what of its own, inherent limitations? One perspective — after the jump.

Dr. Taras Zagibalov holds a doctorate in Informatics and heads the natural language processing research team at social media monitoring company Brandwatch. His work centers on improving real-time sentiment analysis to deliver accurate and up to date information for brands.

It’s not hard to find criticism of automatic sentiment analysis. Many of the most persuasive examples focus on illustrating how poor machines are at understanding emotions expressed through the complexities of human language. In some ways, they’re right.

Why?

Because these expressions are often only fully comprehensible with additional contextual and background information – information that may not be available to the machine.

As humans, we might like to think we’re better than machines in this sense; that it’s always easy for us to accurately decide what is positive, what is negative and what is neutral. However, this isn’t always the case. Humans often struggle to determine the sentiment of a piece of text because, just like the machine, they do not have the relevant knowledge available. Two humans having different perspectives, different knowledge-bases, different life experiences and different frames of reference can mean their analyses of the same text vary greatly.

Obstacles to both humans and machines

Here are some examples of issues that face humans, as well as machines:

“It has grown by 10%”

Is this good or bad? Firstly of course, the answer depends entirely on what “it” is (for instance, income or unemployment) and secondly what we know about growth in that context; is 10% a good or bad amount to grow by? Is growth a good thing at all? Ambiguities like this are not rare; it is extremely common that, to be analysed accurately, pieces of text require some expertise or knowledge that is not commonly possessed.

“The delivery was good”

An academic study showed that, in the context of eBay user feedback, the word ‘good’ is in fact a slight indicator of negativity. Someone without much online selling experience may conclude that the above is positive while the same review may upset a seasoned eBay seller. Similarly, for an ultra-luxury brand ‘good’ might not be good enough.

“The price has dropped, it’s really cheap now”

A final example to illustrate the perspective-dependent nature of any sentiment analysis – the above may be good news for those interested in buying the product, but shareholders of the company selling it will be less pleased about the implications of the statement.

Examples like the above are often mistakenly cited solely as obstacles to automatic sentiment analysis, when really they are just as applicable to human analysis. To perform accurately and judge statements like these correctly, humans need to be fully informed of the related context, background, standards and so on – machines are no different.

Human-specific issues: time, boredom, concentration

So, there are many examples when humans may find it difficult to agree on what sentiment a text has, but the situation can become further exacerbated when they are required to make quick decisions while processing large amounts of data. Though we might not like to admit it, humans often tend to get tired, bored, annoyed by the work they do. Humans are not “designed” for doing monotonous work. We are only good at consistent and accurate evaluation of textual content for limited sessions and in the right environment.

Our studies on inter-annotator agreement have shown how difficult it is to get a high level of agreement between two humans (yes, just two). I witnessed less than 30% agreement on sentiment annotation on a not-very-large dataset annotated by trained and educated native speakers. I also remember a triple-annotated corpus used in one academic workshop in which one of the annotators seemed to be simply pressing the same button all the time. Perhaps they were bored or unable to concentrate, or perhaps they were trying to complete the task as quickly as possible – the manual nature of human analysis means time is a particularly significant factor. Not only does the analysis take a long time, but pressure to hit targets and complete tasks in certain timeframes may seriously affect accuracy when work is rushed.

Where machines can help

What about machines? They never get bored, tired or lose interest in their job, we know that. And when it comes to time too, the instantaneousness of machines can add a dimension which is entirely unattainable with human analysis on its own. But still, they aren’t much use if they continuously produce inaccurate analysis. Can they understand if 10% growth is good or bad? Can they understand if “good” is actually “bad” in a certain context? Actually, yes they can. The secret of effective automatic sentiment analysis is based on an understanding of its danger areas: domain-dependency and time-dependency.