I've seen a few companies unhappy with vendors charging expensive monthly fees to provide data science based scoring technologogy: for detecting fraud, for placing keyword bids on AdWords and for other purposes.
I've started a project where what I've been doing so far is to reverse-engineer an algorithm from such a vendor. The hope is that the client can implement this algorithm internally and get rid of the expensive vendor.
I created the word "reverse data science" as an analogy with "reverse engineering".
Do you believe that it might be possible to make a living out out it, either as a consultancy, or by developing software that performs automated reverse engineering and provide as output an algorithm equivalent to the (unknown, secret) one that was revealed. What about publishing the results of some experiments, for instance an article about "here's how Google detects click fraud" based on discoveries arising from reverse engineering?
Related article: New start-up ideas for data scientists
You have to be a little careful here. Contracts may include language prohibiting the client from trying to reverse-engineer the scores.
What about this scenario:
If vendor has truly a terrific solution that is worth every dollar the client spends on it, why would the vendor be scared about reverse engineering? You'd think that the vendor is smart enough to have developed technology to prevent any successful reverse engineering from happening, right?
I agree with the responders who have raised concerns about any effort that is explicitly intended or labeled as "reverse engineering" of a proprietary solution from a vendor under contract. Dangerous waters there.
OTOH, an in-house effort to develop a NEW predictive methodology, to be used in a Champion vs Challenger kind of shoot-out to try to prove or disprove the continuing (purported) superior performance of the current vendor solution, and therefore whether the perceived premium price is justified on a cost/benefit basis, is a reasonable business practice that the current vendor may not enjoy but could hardly prevent. And you would WANT them to know that their "value" is constantly under review, in comparison to in-house or market alternatives.
Bear in mind on these kind of things ... part of what the client often gets is is the benefits of the vendor's proprietary data assets, upon which the predictions in question are based/trained. And the vendor's product (and price) includes ongoing maintenance of the training data and predictive models, so that they (hopefully) reflect dynamic changes in the markets / consumer behavior / etc. Depending on the source, nature, size and complexity of the input data, a vendor with dozens or hundreds of customers may benefit from significant economies of scale that a DIY-er might not. In other words, the vendor's pricing may not be at all extreme when considering the complete cost of production (or substitutes).
The task here isn't just closely replicating the predictions (and possibly predictive model) currently used, on a single point-in-time basis, it's creating an alternative production process that will support the needs on an ongoing basis and evolve as the specific thing/s being modeled change as the world also change.The predictive model itself is obviously very important, but it's only one of the necessary components in delivering the "product."
But location of the input data is usually going to be a key consideration in these kinds of Build vs Buy decisions. Already having the data tends to make in-sourcing much more practical.
Sounds like a gimick?
Why not just build a model that gets as good or better classification or regression error scores?
Not everyone interested in reverse engineering or reverse data science (RDS) is motivated by direct financial incentives. What about
In many cases, good data sets can be downloaded or harvested on the web using public sources, without using client data that could be subject to legal issues.
There is a whole industry devoted to reverse engineering Google - SEO industry. Not sure how successful they are in practice, but they are very successful in marketing to business owners who know less than the consultants.
If you have the same data as the vendor, there is no reason why you can't come up with a similar-performing algorithm, but it may be hard to have the same data in most applications.
There is nothing wrong with reverse engineering. All these supposed legal prohibitions are just the scare tactics. Point to a single successful lawsuit to prove otherwise.
Either way if we take some results of someone's algorithm visible in the outside world and make them features of our own algorithm we've achieved very similar results.
I was surfing the web for my research on a data problem, happen to found this ...very few of the companies apply data science starting with a true definition of business problem and solving using data science techniques.. this looks promising ...do check