Subscribe to DSC Newsletter

Can Context Extraction replace Sentiment Analysis?

Sentiment analysis is hard. Most of the systems on the market will clock anywhere around 55-65% for unseen data, even though they might be 85%+ accurate in their cross-validations.

A couple of reasons why creating a generic sentiment analyser is tough;

- There is too much variation in texts across domains, leading to different meanings

- Identifying sarcasm and combination of phrases like, 'not bad' is not equal to 'not' AND 'bad'

At this juncture, it's important to realize that sentiment analysis is critical for any system monitoring customer reviews or social media posts. Hardly had the business world caught up with a sentence level sentiment analysis, we are now moving to aspect level sentiment analysis - more directed & granular, adding to the complexity. The question is this - can we do something to augment our sentiment analysis?  

For the past few months, I have been using context and relationship extraction to augment sentiment analysis. I treat them as important meta-information to use either as learning features and/or augmented information for my customers.

I use 4 important 'context' to identify a target sentence;

- Entities like a location, name, person 

- Keyphrases

- Relationships

- Topic/Concept


To aid in extracting these information, I have created and modified own generic lexical parser, relationship extractor and a topic model. I take help of DBpedia API to extract entities.

Let me demonstrate with an example;

Sentence 

Pakistan’s army Chief General Raheel Sharif has said his troops are ready to tackle any long or short misadventure by the “enemy”

Entities:

['Chief General', 'Raheel Sharif', 'Pakistan']

Keyphrases:

['pakistan', 'chief general raheel sharif', 'troops', 'short misadventure', 'enemy']

Relationships:

(Sharif,said), (army, ready), (army,tackle)

Topic/Concept:

Unrests & War 

Does context before sentiment make sense?

Ideally, the above sentence denotes a negative sentiment. While some APIs might identify it correctly, some may still end up tagging it neutral/positive. However, if you create a model to feed in your context, or at the very least provide them as additional information to the user (either as tooltips/export), this can augment sentiment analysis in a big way.

- Some topics will almost always have a negative sentiment.

- Troubled entities will generate negative news.

- Keyphrases do a good job of pinpointing intent when combined with relationships

With the ferocity of new data being generated everyday, it's either not useful to always use standard datasets (like IMDb or polarity datasets) for training sentiment models, or too expensive to create your own training sets.

For some, these assumptions may seem naive - but it has worked in more than once for multiple NLP projects I worked on - either increasing model accuracy or being a validation layer for analysis. Intuitively, adding contextual information to your corpus makes sense.

I would be very interested to know your thoughts on my assumptions and overall process flow. Context Analysis can augment, if not replace, sentiment analysis.

Your thoughts?

Views: 1799

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Jim on April 12, 2018 at 2:26am

Would you have a worked out example to share?

Comment by Earl Cox on September 10, 2015 at 10:26pm

I agree with you. But I try to find higher level conceptual categories that can guide my sentiment analysis and generate positive, neutral, of negative rankings based on the relationships among concepts. How do I find these concepts? The old fashioned way. And a way overlooked in today's frenetic rush to data mining and supervised learning. 

I couple my sentiment analysis to the Roget's thesaurus. There is a free 1911 version although you have to write your own parser (not an easy task). Even the 1911 version contains a huge set of concepts, sub-concepts and super-concepts. Then you can let a client's SME group extend the thesaurus. By combining a thesaurus with sentiment analysis it is possible to compare tokens in the NLP tree parse to higher semantic concepts. Hence you can let the syntax and grammar graph use the thesaurus to automatically generate topics and semantic concepts with a wider and often more "intuitive" coverage.

Comment by Earl Cox on September 10, 2015 at 7:56am
I fail to see how your example sentence "ideally denotes a negative sentiment." In the context of tactical or strategic military readiness it denotes a positive sentiment. The mere fact that the topic is war does not make it a negative sentiment. In any case, a broader and less biased or emotionally charged classification might be military, homeland security, defense posture, preparedness, etc.

In my experience in building a wide spectrum of sentiment analysis systems in areas such as politics, sports, energy, and target marketing I have yet to find topics that almost always have a negative sentiment. Even "he killed the man with two shots to the head" can be a positive sentiment if the shooter was saving the lives of his family, if a soldier killed a suicide bomber, if a kidnapper was about to kill a hostage. This is a violent action, to be sure, but the sentiment, in terms of the contextual objective function, can be positive.
Comment by Manas Ranjan Kar on September 10, 2015 at 7:50am

Thanks for your comment Earl.

What you pointed out is exactly my point - Sentiment Analysis is too subjective and fraught with bias for it to be deadly accurate on unseen data - everyone has an individual view. For example, AlchemyAPI, TextBlob and my model pointed tagged it as negative, positive and negative respectively.

Ideally, I use these additional features as feature inputs for my model, to reduce errors and take 'external' help to arrive at a sentiment. Topic classification is a different problem altogether, you can define broad/granular categories, provided you have enough training data to validate a model.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service