Sentiment analysis is hard. Most of the systems on the market will clock anywhere around 55-65% for unseen data, even though they might be 85%+ accurate in their cross-validations.
A couple of reasons why creating a generic sentiment analyser is tough;
- There is too much variation in texts across domains, leading to different meanings
- Identifying sarcasm and combination of phrases like, 'not bad' is not equal to 'not' AND 'bad'
At this juncture, it's important to realize that sentiment analysis is critical for any system monitoring customer reviews or social media posts. Hardly had the business world caught up with a sentence level sentiment analysis, we are now moving to aspect level sentiment analysis - more directed & granular, adding to the complexity. The question is this - can we do something to augment our sentiment analysis?
For the past few months, I have been using context and relationship extraction to augment sentiment analysis. I treat them as important meta-information to use either as learning features and/or augmented information for my customers.
I use 4 important 'context' to identify a target sentence;
- Entities like a location, name, person
- Keyphrases
- Relationships
- Topic/Concept
To aid in extracting these information, I have created and modified own generic lexical parser, relationship extractor and a topic model. I take help of DBpedia API to extract entities.
Let me demonstrate with an example;
Sentence
Pakistan’s army Chief General Raheel Sharif has said his troops are ready to tackle any long or short misadventure by the “enemy”
Entities:
['Chief General', 'Raheel Sharif', 'Pakistan']
Keyphrases:
['pakistan', 'chief general raheel sharif', 'troops', 'short misadventure', 'enemy']
Relationships:
(Sharif,said), (army, ready), (army,tackle)
Topic/Concept:
Unrests & War
Does context before sentiment make sense?
Ideally, the above sentence denotes a negative sentiment. While some APIs might identify it correctly, some may still end up tagging it neutral/positive. However, if you create a model to feed in your context, or at the very least provide them as additional information to the user (either as tooltips/export), this can augment sentiment analysis in a big way.
- Some topics will almost always have a negative sentiment.
- Troubled entities will generate negative news.
- Keyphrases do a good job of pinpointing intent when combined with relationships
With the ferocity of new data being generated everyday, it's either not useful to always use standard datasets (like IMDb or polarity datasets) for training sentiment models, or too expensive to create your own training sets.
For some, these assumptions may seem naive - but it has worked in more than once for multiple NLP projects I worked on - either increasing model accuracy or being a validation layer for analysis. Intuitively, adding contextual information to your corpus makes sense.
I would be very interested to know your thoughts on my assumptions and overall process flow. Context Analysis can augment, if not replace, sentiment analysis.
Your thoughts?
Comment
Would you have a worked out example to share?
I agree with you. But I try to find higher level conceptual categories that can guide my sentiment analysis and generate positive, neutral, of negative rankings based on the relationships among concepts. How do I find these concepts? The old fashioned way. And a way overlooked in today's frenetic rush to data mining and supervised learning.
I couple my sentiment analysis to the Roget's thesaurus. There is a free 1911 version although you have to write your own parser (not an easy task). Even the 1911 version contains a huge set of concepts, sub-concepts and super-concepts. Then you can let a client's SME group extend the thesaurus. By combining a thesaurus with sentiment analysis it is possible to compare tokens in the NLP tree parse to higher semantic concepts. Hence you can let the syntax and grammar graph use the thesaurus to automatically generate topics and semantic concepts with a wider and often more "intuitive" coverage.
Thanks for your comment Earl.
What you pointed out is exactly my point - Sentiment Analysis is too subjective and fraught with bias for it to be deadly accurate on unseen data - everyone has an individual view. For example, AlchemyAPI, TextBlob and my model pointed tagged it as negative, positive and negative respectively.
Ideally, I use these additional features as feature inputs for my model, to reduce errors and take 'external' help to arrive at a sentiment. Topic classification is a different problem altogether, you can define broad/granular categories, provided you have enough training data to validate a model.
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central