Recently we read a lot about fake news, alternate facts and journalism lies. Companies like Facebook develop data science algorithms to detect these postings, based among other things on crowd sourcing (collective intelligence.)
But can the data scientist, with her inquisitive mind and strong sense of numbers and probabilities, use her brain to assess how true a piece of information is? I am talking here about fuzzy logic, and human rather than artificial intelligence to determine the probabilities.
Here is a recent, popular example: the Firefall in the Yosemite National Park (California), pictured below.
It is supposed to be a rare natural event occurring only in February under certain conditions, according to National Geographic. But there is also a famous, artificial firefall that took place a few miles away each year until 1968, and it was man-made (people throwing embers in the water atop the cliff)
As a data scientist, my first reaction is to assess a probability that the natural firewall is indeed genuine.
Of course, this is balanced by the fact that journalists would report such an event only if it is extremely rare - a one in a million chance. That puts the odds of being real, according to my very wild guess, at 1,000,000 / (10,000 x 10 x 1,000,000) = 1 / 100,000.
My figures look more like an answer to a Microsoft job interview question. But it leads to some interesting question: What if the truth is somewhere in-between? What if the picture truly features a genuine, natural event, but the colors were altered or maybe it was once again a man-made event that the journalist was unaware of? How do you assign a probability to the fact that
Just food for thoughts. I don't have an answer, as I haven't spent enough time investigating this. But for about 30% of the news that I read in reputable outlets (and 95% of Facebook content) I am asking myself the same question.