Home » Technical Topics » Data Science

#TextAnalytics concepts can be used to deal with credibility issues in the main stream media

This article was written by Ramsundar Lakshminarayanan.

Main stream media’s credibility at an all time low

Credibility of the media has taken a beating in recent years. Elections of #Modi, #Brexit and #Trump are widely believed to be a testament to that. These events have helped expose the misinformation and propaganda that occupy a pivotal space in mainstream media literature. It is a discourse culture that is devoid of facts and pregnant with bias.

It is an unfortunate reality.

It is in this context I leveraged existing concepts in #TextAnalytics such as #TopicModel and #SentimentAnalysis to objectively assess media content & provide pertinent information to potential reader in easy to understand visual manner.

Imagine if you had advance information that describes the content well – about a talk show or an article, that will help you decide if you want to sit through the talk show or read the entire article. I explored ways to glean that “relevant information” in an unbiased manner.

With nascent knowledge in #TextAnalytics, I hunted for a suitable hypothesis to work on.


Mani Shankar Aiyar is a former Indian diplomat and a politician, widely perceived to be the one who catapulted Indian Prime Minister Mr Narendra Modi’s 2014 election campaign with disparaging remarks about Modi’s social status as a child tea vendor. Since the election of Modi in 2014, I found his opinion pieces in ndtv.com, to be extremely critical of Modi, negative in tone, condescending in tenor, and notwithstanding his party’s drubbing in 2014 elections, rich in temerity. I lost interest in his pieces and stopped reading them in Fall of 2014.

His articles gave me the perfect hypothesis to start with and Trump’s victory gave me the perfect trigger to test my hypothesis. 

Topic Modeling and Sentiment Analysis

A total of 155 articles, published by the author in ndtv.com between Jan ’14 and Mar ’17 was used for analysis purposes. While not being very large, the corpus still offered a decent sample size to test my hypothesis. Here are my findings.

– Findings 1: What topics does he usually write about?

Following key topics were detected in an automated manner using a popular topic modeling technique. 5 topics- Modi, Gandhi, India, Pakistan & Time that aligned well with my personal knowledge of the author’s interest.

– Findings 2: Top Subjects he focused on

Top 10 subjects he wrote about were Modi, Pakistan, India, Gandhi, Govern (government/governance), PM, Will (tendency to assert & question), BJP, Jaitley & China. He wrote about Modi in 74 of the 155 articles between January 2014 and March 2017. List of subjects matches with my personal knowledge of the author’s interest.

– Findings 3: What sentiments did he exhibit?

More than 50% of the articles exhibited a negative tone. This was discovered using open source lexicons that classify words into positive, negative or neutral sentiment. Hardly 25% of his articles exhibited positive sentiment. A fraction of it is neutral. Here again, findings align well with my personal experience reading his articles earlier.

– Findings 4: What Emotion did he express?

Articles exhibited all of the following emotions – anger, anticipation, disgust, fear, joy, sadness, surprise & trust as shown in the below visual. This was discovered using open source lexicons that classify words into emotions. Relating it to sentiments, aggregates predominantly into a negative tenor. This again, aligns well with my personal experience reading the authors articles earlier.

– Findings 5: What more can be described about his articles?

Top key words were derived for each article based on a popular information retrieval technique (#TF-IDF) to describe the content better. This helps to get further understanding of the content without having to read the entire article. See below for example.

Summary of the Analysis

With these findings, it was possible for me to gain a much better understanding of his articles fairly quickly.

Automatic topic detection and open source lexicons provided neutrality and transparency to the analysis, while popular information retrieval technique provided legitimacy to the analysis.


To read the whole article, with illustrations, click here.

Leave a Reply

Your email address will not be published. Required fields are marked *