Managing the tricky balance between data pooling and data retention with predictive platforms

It’s only when there’s enough representative data from the field they’re applied to, that process automation by Machine Learning technologies can really be harnessed, says Jean-Cyril Schütterlé, VP Product and Data Science at Sidetrade.

Likewise, spam detection is most effective when the learning algorithm has been populated with relevant examples. The steering system of a self-driving car is the same story. It won’t function properly until it has learned to recognise other vehicles and road signage via traffic imaging. Similarly, a healthcare diagnostic support tool depends upon medical image matching. And an automatic translation tool will need to draw on a body of existing translated texts.

It’s exactly the same principle with the application of Machine Learning to the field of customer relationship management. Many companies have understood this and developed the role of chief data officer to break down the data silos in their organisations.

Performance then is inextricably linked to the availability of representative data. So, is it now time we began looking at breaking down walls between companies and the outside world?

Predictive platforms and the data challenge

Development of SaaS predictive platforms (or DaaS, Data-as-a-Service) has brought this issue into sharp focus. DaaS is designed for the application of Machine Learning for various purposes. It can be used to uncover sales opportunities, optimise sales processes, perform financial modelling and planning, detect payment defaults and forecast payment delays. Each platform holds data entrusted by clients on a shared data centre. So, it’s technically easy to pool this data and deepen algorithmic investigation fields to give better quality data feedback.

Let’s take the example of a sales lead generation platform. What would be the result of pooling all the data held on the CRM database from the diverse companies using the software? It would allow each individual company to get a broader understanding of client needs and buying behaviour than with the use of a single data source. But here’s the rub. The companies using the platform may well be competitors. That’s why, not unsurprisingly, many are reluctant to have their data used to confer competitive edge to their rivals.

Data pooling versus data retention

So, this is the dilemma. Nobody’s keen to share data, even though the advantages of shared access are obvious. It means today’s decision-makers are between a rock and a hard place. Should they curb use of this performance-boosting technology? Or should they allow third parties to reap the benefits of their own internally generated data? It’s tricky. Because, as with Pandora’s box, once the data is out there, it can’t be taken back.

There’s no free ride in all this. Each organisation must weigh up the cost-benefit ratio on a case-by-case basis. Then they must make the difficult choice between data retention and data pooling, based on their development and deployment needs.