It has become clear over the last few months that mainstream media on both sides are stretching the truth, if not reporting fake stories first published in outlets such as The Onion. The famous picture below (about the number of people attending Trump's inauguration versus Obama 8 years ago) raises interesting questions.
I am not saying that the images are fake, but rather, I point to the kind of questions everyone should ask herself, especially every data scientist, before coming to a conclusion.
- Were the pictures taken at the same time, or one of them (or both) is before/after the event
- Are these pictures corroborated by images from other sources?
- Was it rainy/cold on the picture with fewer people, and if yes, what kind of impact can the weather have on these events?
- Did people viewed the event on TV or online more than in the past, thus explaining the lack of visitors in the right picture?
- Is there a general decline in the interest for these events, meaning that the next inauguration won't attract more people? Was there also significantly less people 12 or 16 years ago, meaning that Obama 2009 was an exception?
- Is "more people" a good thing? Was the 2009 crowd (those present in the picture) more president-friendly back then, or is it the opposite?
- What about the statistics of public transportation, flights to DC, hotel room bookings? Are they consistent with a drop in the size of the crowd?
- Was it more difficult to attend the event this time due to security reasons?
- Is the data corroborated by a survey (done properly) asking random people if they attended the event in 2009, and if they attended the event in 2017? Such a survey should factor in the fact that some who attended the 2009 event are now dead.
- Is there a bias in the sense that conservatives are less interested than liberals in attending these events? This is easy to check by comparing with previous elections. Or maybe neither conservatives nor liberals identify with the new president this time.
- Are Trump's supporters less numerous in the DC area (compared with Obama), meaning that many would have had to travel long distances and purchase a plane ticket to attend the event? If that is the case, we would see a drop in the crowd in DC, but not a drop in people watching the event on TV or remotely.
It would be easy to dig in most of these questions and get a rather accurate answer, either confirming the veracity of this story, or not. We haven't done the research, but we would be happy to hear from people who did. What is at stake here is not whether or not Trump did get a smaller crowd, but whether we can still trust what we read in the news.
Bottom line: when confronted with data, this is how any data scientist should react - asking legitimate questions - in case there is a doubt regarding the trustworthiness of the source.
The rise of fake news
Just out of curiosity, I did some research to find out how the term "fake news" has become popular recently. The chart below, showing Google queries for "fake news", speaks volumes.
Google trends for the keyword 'fake news'
Click here for an interactive version of this chart. I also believe that data science, using AI (artificial intelligence) techniques to automatically process these news and investigate the context, will be able to identify news that are real, unbiased, and accurate.
Top DSC Resources
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge