In this article, using a few examples and solutions, I show that the "best" algorithm is many times not what data scientists or management think it is. As a result, too many times, misfit algorithms are implemented. Not that they are bad or simplistic. To the contrary, they are usually too complicated, but the biggest drawback is that they do not address the key problems. Sometimes they lack robustness, sometimes they are not properly maintained (for instance they rely on outdated lookup tables), sometimes they are unstable (they rely on a multi-million rule system), sometimes the data is not properly filtered or inaccurate, and sometimes they are based on poor metrics that are easy to manipulate by a third party seeking some advantage (for instance, click counts are easy to fake.) The solution usually consists in choosing a different approach and a very different, simple algorithm - or no algorithm at all in some cases.
Five Case Studies
Here I provide a few examples, as well as an easy, low-cost, robust fix in each case.
Many times, the problem is caused by data scientists lacking business understanding (they use generic techniques, and lack domain expertise) combined with management lacking basic understanding of analytical, automated, optimized (semi-intelligent, self-learning) decision systems based on data processing. The solution consists of educating both groups, or using hybrid data scientists (I sometimes call them business scientists) who might not design the most sophisticated algorithms, but instead the most efficient ones given the problems at stake - even if sometimes it means creating ad-hoc solutions. This may result in simpler, less costly, more robust, more adaptive, easier to maintain, and generally speaking, better suited solutions.
More about the Facebook ad processing system
The first four cases have been discussed in various articles highlighted above, so here I focus on the last example: the image recognition algorithm used by Facebook to detect whether an image contains text or not, to assess ad relevancy -- and best illustrated in figure 1. This algorithm eventually controls to a large extent, the cost and relevancy associated with a specific ad. I will also briefly discuss a related algorithm used by Facebook, that also needs significant improvement. I offer solutions both for Facebook (to nicely boost its revenue yet boost ROI for advertisers at the same time), as well as solutions for advertisers facing this problem, assuming Facebook sticks with its faulty algorithms.
Figure 1: Facebook image recognition algorithm thinks the above image contains text!
Solution for advertisers
For each article that you want to promote on Facebook, starts with a small budget, maybe as small as $10 spread over 7 days, and target a specific audience. Do it for dozens of articles each day, adding new articles all the time. Regularly check articles that exhausted their ad spend; boost (that is, add more dollars of ad spend) to those that perform well. Performance is measured as the number of clicks per dollar of ad spend. All this can probably be automated using an API.
Solution for Facebook
Eliminate the algorithm that is supposed to detect text in images associated with ads. Instead focus on click-though rate (CTR) like other advertising platforms (Google, Twitter.) Correctly measure impressions and clicks to eliminate non-human traffic, to compute an accurate CTR.
Predicting reach based on ad spend
Facebook provides statistics to help you predict the reach for a specific budget (ad spend) and audience, but again, the algorithm doing this forecast is faulty, especially when you try to "add budget" prior to submitting your ad. Google also provides forecasts that in my experience, are significantly off. I believe the problem is that for small buckets of traffic, the strength of this forecast is very weak. While confidence intervals are provided, they are essentially meaningless. The solution to this problem is to either provide the strength of the forecast (I call it predictive power), or not provide a forecast at all: the advertiser can use the solution offered in the previous paragraph to optimize her ad spend. And if Facebook or Google really want to provide confidence intervals for their forecasts, they should consider this model-free technique that does not rely on the normal distribution: it is especially fit for small buckets of data that have arbitrary, chaotic behavior.
Top DSC Resources
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Comment
I know of an IoT vendor which has a face recognizer as part of a fancy dashcam that you put into commercial vehicles. The task of the face recognizer is to decide which driver is at the wheel, of all the drivers that you hired and gave badges to. In other words, given face X, decide which out of 50 it is mostly likely to be.
The implementors used an open source detection app that was designed, presumably for police/intelligence use, to match faces against a database of, say, 1m people, with an obviously unavoidable failure rate of 1-2%. This failure rate is deterministic, there, it always fails on the same 1%.
So, at every commercial truck fleet of any significant size, there's always that one guy who sits down and the recognizer says, "Nope, I don't know him" even though it only has 50 faces to check against!
It's always important to solve the right problem.
"the problem is caused by data scientists lacking business understanding (they use generic techniques, and lack domain expertise) combined with management lacking basic understanding of analytical, automated, optimized (semi-intelligent, self-learning) decision systems based on data processing. The solution consists of educating both groups",
OR -- also exploiting Executable English to bridge the gap. It has a simple, familiar mental model for self-serve writing of BI apps that explain their results. Live online at executable-english.com Shared use is free, and there are no commercials.
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central