Recently we read a lot about fake news, alternate facts and journalism lies. Companies like Facebook develop data science algorithms to detect these postings, based among other things on crowd sourcing (collective intelligence.)

But can the data scientist, with her inquisitive mind and strong sense of numbers and probabilities, use her brain to assess how true a piece of information is? I am talking here about fuzzy logic, and human rather than artificial intelligence to determine the probabilities.

Here is a recent, popular example: the Firefall in the Yosemite National Park (California), pictured below.

It is supposed to be a rare natural event occurring only in February under certain conditions, according to National Geographic. But there is also a famous, artificial firefall that took place a few miles away each year until 1968, and it was man-made (people throwing embers in the water atop the cliff)

As a data scientist, my first reaction is to assess a probability that the natural firewall is indeed genuine.

- Probability for two such unrelated events (man-made and natural) to occur so close to each other, assuming both occurred: Maybe 1/10,000?
- Probability for the two events, to look so similar: Maybe 1/10?
- Probability for any one event to occur anywhere in a waterfall: Maybe 1/1,000,000? (who has ever seen a natural firefall?)
- Could this phenomenon be replicated in a laboratory or simulated on a computer? Probability for the sun to create that kind of glow on water?

Of course, this is balanced by the fact that journalists would report such an event only if it is extremely rare - a one in a million chance. That puts the odds of being real, according to my very wild guess, at 1,000,000 / (10,000 x 10 x 1,000,000) = 1 / 100,000.

My figures look more like an answer to a Microsoft job interview question. But it leads to some interesting question: What if the truth is somewhere in-between? What if the picture truly features a genuine, natural event, but the colors were altered or maybe it was once again a man-made event that the journalist was unaware of? How do you assign a probability to the fact that

- The picture is real, unaltered
- The picture is real, colors are exaggerated
- The picture is real, unaltered, but the explanation is incorrect, or maybe totally wrong
- This is fake, maybe a picture created using some software, or some other artifact, inspired by the old man-made firewall, and created on purpose to go viral

Just food for thoughts. I don't have an answer, as I haven't spent enough time investigating this. But for about 30% of the news that I read in reputable outlets (and 95% of Facebook content) I am asking myself the same question.

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Natural Language Trends in Visual Analysis - Aug 6

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Natural Language Trends in Visual Analysis - Aug 6

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central