Home » Uncategorized

A methodology for solving problems with DataScience for Internet of Things – Part Two

Many thanks for the retweets and feedback on Part one of this blog 

A methodology for solving problems with DataScience for Internet of…

Here is Part Two

Here we extend the discussion and also suggest a practical (and open) way to create a way forward

To recap, lets keep in mind the big picture and after considering Streaming in the previous section, let us consider more techniques like Edge Processing etc


Edge Processing

Many vendors like Cisco and Intel are proponents of Edge Processing  (also  called  Edge  computing).  The  main  idea behind Edge Computing is to push processing away from the core and towards the Edge of the network. For IoT, that means pushing processing towards the sensors or a gateway. This enables data to be initially processed at the Edge device possibly enabling smaller datasets sent to the core. Devices at the Edge may not be continuously connected to the network. Hence, these devices may need a copy of the master data/reference data for processing in an offline format. Edge devices may also include other features like:

•    Apply rules and workflow against that data

•    Take action as needed

•    Filter and cleanse the data

•    Store local data for local use

•    Enhance security

•    Provide governance admin controls

IoT analytics techniques applied at the Data Lake

Data Lakes

The concept of a Data Lake is similar to that of a Data warehouse or a Data Mart. In this context, we see a Data Lake as a repository for data from different IoT sources. A Data Lake is driven by the Hadoop platform. This means, Data in a Data lake is preserved in its raw format. Unlike a Data Warehouse, Data in a Data Lake is not pre-categorised. From an analytics perspective, Data Lakes are relevant in the following ways:

  • We could monitor the stream of data arriving in the lake for specific events or could co-relate different streams. Both of these tasks use Complex event processing (CEP). CEP could also apply to Data when it is stored in the lake to extract broad, historical perspectives.
  • Similarly, Deep learning and other techniques could be applied to IoT datasets in the Data Lake when the Data  is ‘at rest’. We describe these below.

ETL (Extract Transform and Load)

Companies like Pentaho are applying ETL techniques to IoT data

Deep learning

Some deep learning techniques could apply to IoT datasets. If you consider images and video as sensor data, then we could apply various convolutional neural network techniques to this data.

It gets more interesting when we consider RNNs(Recurrent Neural Networks)  and Reinforcement learning. For example – Reinforcement learning and time series – Brandon Rohrer How to turn your house robot into a robot – Answering the challenge – a new reinforcement learning robot

Over time, we will see far more complex options – for example for Self driving cars  and the use of Recurrent neural networks (mobile…

Some more interesting links for Deep Learning and IoT:


Systems level optimization and process level optimization for IoT is another complex area where we are doing work. Some links for this


Visualization is necessary for analytics in general and IoT analytics is no exception

Here are some links

NOSQL databases

NoSQL databases today offer a great way to implement IoT analytics. For instance,

Apache Cassandra for IoT

MongoDB and IoT tutorial

Other  IoT analytic techniques

In this section, I list some IoT  technologies where we could implement analytics

A Methodology to solve Data Science for IoT problems

We started off with the question: Which points could you apply analytics to the IoT ecosystem and what are the implications? But behind this work is a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  I am exploring this question as part of my teaching both online and at Oxford University along with Jean-Jacques Bernard.

Here is more on our thinking:

  • CRISP-DM is a Data mining process methodology used in analytics.  More on CRISP-DM HERE and HERE(pdf documents).
  • From a business perspective (top down),we can extend CRISP-DM to incorporate the understanding of the IoT domain i.e. add domain specific features.  This includes understanding the business impact, handling high volumes of IoT data, understanding the nature of Data coming from various IoT devices etc
  • From an implementation perspective(bottom up),  once we have an understanding of the Data and the business processes, for each IoT vertical : We first find the analytics (what is being measured, optimized etc). Then find the data needed for those analytics. Then we provide examples of that implementation using code. Extending CRISP-DM to an implementation methodology, we could have Process(workflow), templates,  code, use cases, Data etc
  • For implementation in R, we are looking to initially use Open source R and Spark and the  h2o.ai  API
  • Make the methodology practical by considering high volume IoT data problems, Project management methodologies for IoT,  IoT analytics best practices etc
  • Most importantly, we will be Open  and Open Sourced
  • There are some parallels with this thinking with Big Data Business maturity model index and to Systems thinking – my favourite systems thinking text is An Introduction to General Systems Thinking – Gerald M. Weinberg
  • When used in teaching for my course, it has parallels to the Concept-context pedagogy (pdf)  where the concepts are tied to the practise in terms of Projects which take center stage


We started off with the question:  At which points could you apply analytics to the IoT ecosystem and what are the implications? And extended this to a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  The above is comprehensive but not absolute. For example, you can implement deep learning algorithms on mobile devices (Qualcomm snapdragon machine learning development kit for mobile mo….  So, even as I write it, I can think of exceptions!

This article is part of my forthcoming book on Data Science for IoT and also the courses I teach

Welcome your comments.  Please email me at ajit.jaokar at futuretext.com  – Email me also for a pdf version if you are interested. If you want to be a part of my course please see the testimonials at Data Science for Internet of Things Course.  

Finally, I will syndicate more sections of the book on Data Science Central. Stay tuned!