January 23, 2018
Erik Cambria is deeply involved in natural language processing (NLP), an aspect of the emerging field of AI. By utilizing NLP, machines will be able to more accurately decode the meanings behind sentences. Previously, Cambria completed his PhD in collaboration with MIT and the University of Stirling. Today, he is an assistant professor at Nanyang Technological University and the founder of SenticNet, through which he researches, promotes, and provides concept-level sentiment analysis. His latest co-authored paper is titled “Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings.”
Business and marketing executives could be forgiven for misunderstanding his research. Cambria’s work is domain-specific and intricately detailed. However, his unique approach to sentiment analysis has clear implications for companies that are seeking to better understand consumers by applying the latest innovation. There is a massive amount of online data pointing towards consumer opinions and preferences with regards to specific brands and products, but companies can’t harvest those business insights reliably until AI gets to the point where it can truly understand a sentence. I spoke with Cambria so that we could hash out his theories and findings and communicate the takeaways to that business audience.
According to Cambria, there are three trouble points with machine learning that are being overlooked, to our own detriment and, possibly, peril:
1. Dependency: machine learning requires a lot of training data and is domain-dependent.
2. Consistency: different training or tweaking leads to different results.
3. Transparency: the reasoning process is uninterpretable (black-box algorithms).
If these issues aren’t addressed within the field, the AI-powered products put out in the private sector could prompt bad decision-making by marketers, executives, and stakeholders.
“People are excited about the wrong things and worried about the wrong things,” says Cambria. Although data analysis techniques are becoming increasingly sophisticated and useful, Cambria points out that we are still lightyears away from a truly intelligent machine, due in part to the fact that we do not even fully understand human intelligence.
Cambria says that people have been scaring the public by making them feel like the Terminator is coming. He counters, “In fact, these are just powerful tools that can learn by examples, but they don’t have consciousness, they don’t have common sense. So, that’s not what we should be scared about. We should be scared about the ethical implications of these systems that make decisions for us without us knowing how that classification was made.”
In the abstract of his latest paper, he notes that AI has gained new vigor and prominence in numerous research fields, but it’s been constrained in natural language understanding.
Cambria writes, “In this work, we couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis.”
Borrowing a metaphor from Doctor Strange in the Marvel Universe, Cambria refers to sub-symbolic AI as “black magic” and says that symbolic AI is “white magic” because it allows us to interpret results and reasoning processes. “I believe the future of AI resides in using those kinds of magic together as the Ancient One would do,” he says, amusingly.
Sub-symbolic AI can easily be scaled up, but it’s more difficult to interpret and control. It’s also considered to be robust against noise and has allowed for recent and rapid acceleration with regards to pattern recognition.
Cambria says, “I call it ‘black magic’ because it’s extremely powerful. You don’t need to do any feature extraction, any feature engineering. You just feed this monster with examples and then you have something that can make decisions for you. So it’s extremely powerful but you don’t have control over it.”
He continues, “White magic is the old school AI where, for example, you have your knowledge representation and can trace the reasoning of the semantic network.” However, he points out that people first attempted symbolic AI many decades ago and lost faith in it. It was difficult and expensive to construct and maintain these semantic networks. Cambria explains the process as follows, “You start from an idea or a model of how you think reality is or how you think decisions should be taken. And then you build a knowledge representation based on that.”
Referring to his recent co-authored paper, he explains, “The problem of symbolic representation in the past is that you had to manually build it and, in this case, we used machine learning to build the graphs.”
Cambria says that by using this new method, the knowledge base is much bigger and has the capability of generalizing and catching more concepts in natural language sentences, pulling valuable consumer insights from tweets, reviews, and blogs. Cambria can see how the algorithm made its decisions by following through the leads and seeing why a certain sentence was associated with a specific polarity.
“The final goal of this work is to decide whether a piece of text is positive or negative,” says Cambria. This capability is relevant for companies conducting market research. It can also be utilized in political campaigning.
Cambria explains, “This is something we, as people, can do very well: when we read a tweet or a news article, we are immediately able to understand if it’s positive or negative, if it’s sarcastic or not, but machines are not good at that. Past approaches to the problem didn’t take into account a lot of things, including anaphora resolution, sarcasm detection, and metaphor understanding. Many sentiment analysis companies today still use very primitive algorithms.”
Cambria says that some of these algorithms might mistakenly conclude that a sentence is positive because it contains the word “happy.” He argues that this method simply isn’t good enough. It fails to take key factors into account, such as motivation and the linguistic intricacies of expression. It leads to incorrect conclusions. Worse yet, people can’t double-check the algorithm’s work to verify accuracy.
He explains, “An example that I often use is that if you have a sentence like ‘this phone is nice, but expensive,’ from a machine learning point of view, it’s exactly the same as saying ‘this phone is expensive but nice.’ I’m using exactly the same kind of words, but I’m swapping the order. And this is important from a sentiment analysis point of view. In the first case, ‘nice but expensive,’ I’m not going to buy it. But if I say ‘expensive but nice,’ I’m saying yeah it’s expensive but I may want to make the effort to buy the phone.”
Using the approach proposed by Cambria et al., companies can understand the meaning of their data and use it for more responsible, effective decision-making. By applying better sentiment analysis tools, companies can quickly assess the popularity of their products and can even go down to a smaller granularity to determine which specific product features are liked or disliked.
As the field of NLP progresses, the reliability of AI-powered business tools will also improve. However, the field is potentially misleading because executives sometimes think of language understanding as a singular problem. In reality, many different things happen within human cognition when language is communicated and processed.
“The truth is that nobody knows how the human brain processes language. This is actually the problem,” says Cambria. “We know the hardware of the brain, but we don’t know what is the operating system, how the software works. So, the only thing that we can do is to break down the problem into smaller problems and then bring an ensemble of all those together.”
Cambria’s conclusion is that we need to combine symbolic and sub-symbolic AI, in order to arrive upon more accurate insights.
“And what I’m saying is that we should leverage both, to emulate the way the human brain works because that’s what we do. Some things we do really bottom-up because we learn from examples by our parents, by our friends, by looking at the world. But in some other cases, we need someone to tell us how things work and then we go from there,” he muses.
Directing his advice towards the business community, Cambria contends, “As an executive, you shouldn’t rely blindly on a machine learning algorithm. You have a lot of examples from your previous customer transactions. You should use machine learning to extract some common rules from those examples and then create a model. This model can be a semantic net or an ontology, or even a rule-based system, which you can use for transparent decision-making.”