Sentiment analysis is hard. Most of the systems on the market will clock anywhere around 55-65% for unseen data, even though they might be 85%+ accurate in their cross-validations.
A couple of reasons why creating a generic sentiment analyser is tough;
- There is too much variation in texts across domains, leading to different meanings
- Identifying sarcasm and combination of phrases like, 'not bad' is not equal to 'not' AND 'bad'