A recent article in Techcrunch describes Twitter and Facebook issues: algorithms unable to detect fake news or hate speech. I wrote about how machine learning could be improved, and what can make implementations under-perform – or not perform at all. And a colleague shared with me an article about how Facebook really sucks at machine learning.
You would think that machine learning simply does not work, at least not as advertised. Here, I actually claim that this is not the case, further explaining what the issues might be, and in short, that machine learning might not be the culprit.
The NewYorker making fun of undetected fake news
It seems that the issues appear in situations that are not critical – such as an ad badly targeted, a racist tweet that goes undetected, or a piece of fake news that goes viral. You don’t hear stories about planes falling down because of poor auto-pilot systems, themselves powered by faulty machine learning algorithms.
So I classified machine learning (ML) implementations in four categories:
- Implementations that work well: for instance, automated cars, automated piloting (planes)
- Implementations that work for a while: high-frequency trading, with too much reliance on automation.
- Implementations that work more or less: Google search, ad targeting (by top companies), home price or weather forecasts, fraud detection.
- Implementations that do not work: spell check (absolutely atrocious for multi-lingual people), fake news detection, fake reviews detection, detection of illegal tweets.
I believe most implementations fall in the third category. Of course, we only see the fourth category (just like when you read the news: you only hear about people who die, not about people who get born.)
The fake news issue
I am not even sure that fake news detection is not working. Sure, tons of fake news run wild on Facebook, Google and everywhere. But they do generate traffic, and thus dollars, at least in the short term. There are two factors at play here:
- Politicians and other people placing fake news in automated news feed systems — I call it news feed hijacking; if they use machine learning algorithms to avoid detection, and they beat Facebook, then it is not a failure of machine learning; it shows that the fraudsters have better machine learning tools.
- Facebook must decide between too many false positives (a real piece of news identified by error as fake), or false negatives (an undetected piece of fake news.) Because false negatives are associated with increased revenue, they might be favored by the algorithm.
But maybe the biggest challenge here is how to define fake news in the first place. If not properly defined, it can not be identified. It is indeed a very fuzzy concept.
When machine learning is used as a scapegoat
- Internal business politics at Facebook, resulting in great algorithms not being used, or used improperly.
- Algorithms/business rules (embedded into algorithmic systems) that are not revisited as needed, or at the mercy of unqualified people for maintenance (software engineers not working with data scientists.)
- Teams not collaborating effectively (e.g. data scientists vs software engineers vs business people.)
- Algorithms tested and prototyped on small data (say on 1% of all ads) thus missing a lot.
- Those criticizing only see the bad stuff, not the good stuff, yet overall these “flawed” algorithms produce good enough value for shareholders.
- Even in my article where I criticize some Facebook algorithms, I still consider and use Facebook as the best advertising platform for us.
- Some of this might be dictated by top executives. Most of what I see on Facebook is uni-directional (politically speaking) as if there is a political agenda. It is as if Facebook tries to influence people. It could be caused by Bay Area software engineers having their algorithms favoring posts or ads that they tend to agree with, with or without executives knowing about it.
- Even in the example where I criticize Facebook’s machine learning technology for being unable to recognize pictures containing text, despite receiving threatening messages about my ads not running because being (erroneously) flagged as containing pictures with embedded text, indeed my ads are delivered without any problems, as if the message is ignored by the system itself.
Top DSC Resources
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus