Data readiness strategies of AI Start-ups

Last week, at an event on AI, I asked the panel about how investors evaluate the Data readiness of AI start-ups. This subject is close to my work and my teaching. I teach a course on Implementing Enterprise AI and also teach Data Science for IoT at the University of Oxford.  Below are my perspectives.  

Data readiness strategies of AI Start-ups


Professor Neil Laurence has proposed a concept of Data readiness levels. The highest level of Data readiness represents Data which is most useful to make predictions i.e. “Can we use this data to prove the efficacy of a drug?”

In many cases, start-ups do not have data that is useful for making predictions. This applies very much to AI start-ups.

 AI is based on Deep Learning algorithms. Deep Learning involves automatic feature detection from data. To do so, by definition, we need a lot of Data. More specifically, we need a lot of labelled data to train the Deep Learning algorithm layers.

Many start-ups/companies do not have this data – and hence may not be able to solve the problem they set out to solve. Hence, one could argue that most AI start-ups are actually not Data ready.

I believe that there are various ways to address this problem

Data readiness strategies

  1. Unsupervised learning ex autoencoders which can be used to create a structure similar to PCA  for example the image processing example using autoencoders
  2. Semi supervised learning: Using unlabelled data with small amounts of labelled data explained in a good paper by Yoshua Bengio
  3. Newer solutions like nanonets  
  4. Synthetic data strategies  
  5. Free or available data to initially train the model
  6. Model zoos 
  7. With less data, one would run a mix of Deep learning and machine learning algorithms – so  feature selection and transformation strategies would apply

My overall  impression is:

AI is a very new field and there is competitive advantage to first movers. Thus, many companies are adopting variants of the above strategies and will move forward even when they have limited data initially. But, by the same token, companies must have a clear set of strategies in place as they address investors. I discuss these ideas in the Implementing Enterprise AI course