Subscribe to DSC Newsletter

There Are Lies, Damn Lies, And Journalism

It has become clear over the last few months that mainstream media on both sides are stretching the truth, if not reporting fake stories first published in outlets such as The Onion. The famous picture below (about the number of people attending Trump's inauguration versus Obama 8 years ago) raises interesting questions.

I am not saying that the images are fake, but rather, I point to the kind of questions everyone should ask herself, especially every data scientist, before coming to a conclusion.

  • Were the pictures taken at the same time, or one of them (or both) is before/after the event
  • Are these pictures corroborated by images from other sources?
  • Was it rainy/cold on the picture with fewer people, and if yes, what kind of impact can the weather have on these events? 
  • Did people viewed the event on TV or online more than in the past, thus explaining the lack of visitors in the right picture?
  • Is there a general decline in the interest for these events, meaning that the next inauguration won't attract more people? Was there also significantly less people 12 or 16 years ago, meaning that Obama 2009 was an exception?
  • Is "more people" a good thing? Was the 2009 crowd (those present in the picture) more president-friendly back then, or is it the opposite?
  • What about the statistics of public transportation, flights to DC, hotel room bookings? Are they consistent with a drop in the size of the crowd?
  • Was it more difficult to attend the event this time due to security reasons?
  • Is the data corroborated by a survey (done properly) asking random people if they attended the event in 2009, and if they attended the event in 2017? Such a survey should factor in the fact that some who attended the 2009 event are now dead.
  • Is there a bias in the sense that conservatives are less interested than liberals in attending these events? This is easy to check by comparing with previous elections. Or maybe neither conservatives nor liberals identify with the new president this time.
  • Are Trump's supporters less numerous in the DC area (compared with Obama), meaning that many would have had to travel long distances and purchase a plane ticket to attend the event? If that is the case, we would see a drop in the crowd in DC, but not a drop in people watching the event on TV or remotely.

It would be easy to dig in most of these questions and get a rather accurate answer, either confirming the veracity of this story, or not. We haven't done the research, but we would be happy to hear from people who did. What is at stake here is not whether or not Trump did get a smaller crowd, but whether we can still trust what we read in the news.

Bottom line:  when confronted with data, this is how any data scientist should react - asking legitimate questions - in case there is a doubt regarding the trustworthiness of the source.  

The rise of fake news

Just out of curiosity, I did some research to find out how the term "fake news" has become popular recently. The chart below, showing Google queries for "fake news", speaks volumes.

Google trends for the keyword 'fake news'

Click here for an interactive version of this chart. I also believe that data science, using AI (artificial intelligence) techniques to automatically process these news and investigate the context, will be able to identify news that are real, unbiased, and accurate.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 14033

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ed williams on February 5, 2017 at 4:44am

I would add two more questions:

  • Is the data given the right data; does it actually measure what is being sought or claimed?
  • Is the data given all the relevant data that is available; has it had any degree of "cherry picking"?

As we all (should) know, getting the data is often much more difficult that analyzing it.  See http://www.zerohedge.com/news/2017-01-27/more-fake-news-media-contr... for the results of those 2 questions in this case.

We must always remember when trying to verify anything where anyone has a personal stake that "half the truth is often a great lie" (Franklin), so we must be especially careful of all sources and every analysis.  Further, since most cases of this sort are not set up as rigorous experiments, there will be gaps where assumptions will have to be made, and those too can be the stuff of great lies.  So always think through the assumptions made, especially if they are not explicitly stated.

Comment by David McDuffee on January 26, 2017 at 1:49pm

What's interesting to me is the question Vincent Granville didn't deem interesting enough to ask: Do the pictures accurately reflect the number of people in attendance at each event?

Each of the questions he did pose seem slanted toward one goal: explaining away the obvious discrepancy in attendance as due primarily to something other than the relative popularity of the President's being inaugurated.

Instead of answering the series of questions which he confesses he has not sought answers for, let me pose one question which I can answer: Were President Obama's favorable ratings  and President Trump's favorable ratings consistent with the different crowd sizes which are obvious in the picture? YES. According to a Gallup poll published the week of the inauguration Trump's pre-inauguration rating was about half what Obama's was.

I concede that there are reputable and less-reputable news sources on all points of the political spectrum, but President Trump has declared war on journalists who are consistently more credible than he is. When a reputable reporter tweets a statement which is not true (e.g., "MLK's bust was removed from the Oval Office") it's corrected as soon as possible. When Donald Trump tweets a statement which is not true (e.g., "I would have won the popular vote, but for millions of illegal votes") it's repeated ad infinitum, and facts be damned.

So when someone says, "Both sides do it," the question everyone should be asking herself is, "Do both sides do it in the same way and to the same degree, or is one side really trying harder to undermine and conceal the truth?"

Comment by Thomas Orth on January 26, 2017 at 11:47am

Of all the possible examples to illustrate the rise in fake news...a photo published by PBS, taken 10 minutes before swear in is a very odd choice. Apart from the most extremist folks, PBS is not seen as a "fake" news organization. Good critical thinking questions follow. 

Comment by Alma Ionescu on January 26, 2017 at 10:56am

Do the pictures correlate with a survey showing Trump's pre-inauguration favorables at 40% (historically low) compared to Obama's 78% (highest since 1993 when the poll started)? http://www.gallup.com/poll/201977/trump-pre-inauguration-favorables... 

Yes.

Was television also a thing during Obama?

Yes

Are the marches against Trump's policies friendly (see question 6)?

No.

In fact, only 570 people moved for the inauguration compared to 3x that number in the march https://www.nytimes.com/interactive/2017/01/22/us/politics/womens-m... 

Actually this last article has a lot of fun facts, including the times when the pics were taken (45 minutes before speech), charts with density areas during Trump vs Obama vs march. The evidence is overwhelming.

Comment by Ben Dutta on January 24, 2017 at 5:57pm

@Vincent, I think you can add one more question to the list - were they dressed in white head to toe making them invisible against the white, grass-cover background? Not entirely off topic, here is a real news article from here in Australia on how data science might have played a part in getting Donald Trump elected.

http://www.smh.com.au/federal-politics/political-news/cambridge-ana... 

Comment by Nina Chaichi on January 24, 2017 at 3:02pm

As much as I don't like the approach of mainstream media on most of issues, I have found the title of your article a little bit harsh. If I recall it correctly, both pictures are taken during the speech (allegedly peak time). And, it is obvious that in-person turnout in 2017 is less than 2009, I see no problem in reporting the fact as it is. Though, if your objection is toward the analysis of why in-person turnout is lower, I agree that all your points besides some others needed to be considered before jump into conclusion.

Comment by Douglas A Dame on January 24, 2017 at 1:54pm

The CNN article that put those two pictures side-by-side clearly indicated the provenance of the pictures, including their timing ... the Trump's inauguration photo was taken during his speech, which should have been the peak audience, and time-not-known for the Obama event. There are not going to be many decent (corroborating) overhead shots because of FAA flight restrictions during this event.

Many of the questions or explanations of WHY attendance may have been different are entirely secondary to the main question of WHETHER the crowds were of different sizes, to the extent it is reasonably possible to assess that. (And many of those questions have also been discussed and answered, including the weather, security arrangements, and the political leanings of DC residents.) 

Personally I don't attach much practical significance at all to the answer ... counting the votes was important, and we did that ...  but the photographic evidence that the crowd sizes at these two events was very different is manifestly clear at face value, and not worth debating as a serious question.

An important part of being an applied data scientist is knowing when to accept the evidence (or not), and move on.

Comment by Harold Henson on January 24, 2017 at 1:37pm

It is correct that the two pictures by themselves could be misleading.  An innocent reader would naturally assume that they were looking at two pics when the crowds were at their peak.  However, every other line of evidence that I have seen, other than those suggested by the Trump team, suggest that less people really did turn out.

Comment by Richard Gola on January 24, 2017 at 1:09pm

Before we assume these images present an accurate representation of reality, I think we must recall that all types of data must be validated. And whom is the validating source?

If I may offer a perspective on imagery. This would be a great opportunity to explore imagery analysis. Surely there is meta data associated with these images that could be verified or explored in depth such as timestamps, geo-location reference, etc.

Another aspect of imagery analysis is the perception of color. For example- I can identify what appear to be white structures in one image, which may or may not have a significant number of people inside. How would I find that answer?  

To go even further, we could explore down to the pixel and create a model defining crowd size. 

Comment by Vincent Granville on January 24, 2017 at 11:40am

Wondering if the spike in 'fake news' queries on Google is real or artificial. It could have been inflated by someone running a Botnet that generates tons of fake (automated) queries about 'fake news', undetected by Google algorithms, maybe someone who wants to make people believe that most news are fake. That said, there is ample evidence that a lot of unverified or biased information has been released by the media. It usually does not come as fake news though, but rather, distorted information and misleading charts. See how to lie with visualizations for instance. 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service