Many thanks for the retweets and feedback on Part one of this blog
A methodology for solving problems with DataScience for Internet of…
Here is Part Two
Here we extend the discussion and also suggest a practical (and open) way to create a way forward
To recap, lets keep in mind the big picture and after considering Streaming in the previous section, let us consider more techniques like Edge Processing etc
Many vendors like Cisco and Intel are proponents of Edge Processing (also called Edge computing). The main idea behind Edge Computing is to push processing away from the core and towards the Edge of the network. For IoT, that means pushing processing towards the sensors or a gateway. This enables data to be initially processed at the Edge device possibly enabling smaller datasets sent to the core. Devices at the Edge may not be continuously connected to the network. Hence, these devices may need a copy of the master data/reference data for processing in an offline format. Edge devices may also include other features like:
• Apply rules and workflow against that data
• Take action as needed
• Filter and cleanse the data
• Store local data for local use
• Enhance security
• Provide governance admin controls
IoT analytics techniques applied at the Data Lake
The concept of a Data Lake is similar to that of a Data warehouse or a Data Mart. In this context, we see a Data Lake as a repository for data from different IoT sources. A Data Lake is driven by the Hadoop platform. This means, Data in a Data lake is preserved in its raw format. Unlike a Data Warehouse, Data in a Data Lake is not pre-categorised. From an analytics perspective, Data Lakes are relevant in the following ways:
- We could monitor the stream of data arriving in the lake for specific events or could co-relate different streams. Both of these tasks use Complex event processing (CEP). CEP could also apply to Data when it is stored in the lake to extract broad, historical perspectives.
- Similarly, Deep learning and other techniques could be applied to IoT datasets in the Data Lake when the Data is ‘at rest’. We describe these below.
ETL (Extract Transform and Load)
Companies like Pentaho are applying ETL techniques to IoT data
Some deep learning techniques could apply to IoT datasets. If you consider images and video as sensor data, then we could apply various convolutional neural network techniques to this data.
It gets more interesting when we consider RNNs(Recurrent Neural Networks) and Reinforcement learning. For example – Reinforcement learning and time series – Brandon Rohrer How to turn your house robot into a robot – Answering the challenge – a new reinforcement learning robot
Over time, we will see far more complex options – for example for Self driving cars and the use of Recurrent neural networks (mobile…
Some more interesting links for Deep Learning and IoT:
- Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference –https://clgiles.ist.psu.edu/papers/MLJ-finance.pdf
- Time series forecasting with recurrent neural networks http://www.neural-forecasting-competition.com/downloads/NN3/methods…
- and an article by Sibhanjan Das and me on Deep Learning and IoT with H2O @kdnuggets
Systems level optimization and process level optimization for IoT is another complex area where we are doing work. Some links for this
Visualization is necessary for analytics in general and IoT analytics is no exception
Here are some links
NoSQL databases today offer a great way to implement IoT analytics. For instance,
Other IoT analytic techniques
In this section, I list some IoT technologies where we could implement analytics
- spatial analytics for IoT – ex from ESRI
- Implementation of IoT analytics on DSPs
- Augmented reality – ex in conjunction with Pokemon Go
- GPUs – there is some extensive work done by Nvidia in this space especially in relation to Deep learning
- I am also following Apache NiFi with great interest with features like Birectional flow of data, perimeter of control changes and data provenance
A Methodology to solve Data Science for IoT problems
We started off with the question: Which points could you apply analytics to the IoT ecosystem and what are the implications? But behind this work is a broader question: Could we formulate a methodology to solve Data Science for IoT problems? I am exploring this question as part of my teaching both online and at Oxford University along with Jean-Jacques Bernard.
Here is more on our thinking:
- CRISP-DM is a Data mining process methodology used in analytics. More on CRISP-DM HERE and HERE(pdf documents).
- From a business perspective (top down),we can extend CRISP-DM to incorporate the understanding of the IoT domain i.e. add domain specific features. This includes understanding the business impact, handling high volumes of IoT data, understanding the nature of Data coming from various IoT devices etc
- From an implementation perspective(bottom up), once we have an understanding of the Data and the business processes, for each IoT vertical : We first find the analytics (what is being measured, optimized etc). Then find the data needed for those analytics. Then we provide examples of that implementation using code. Extending CRISP-DM to an implementation methodology, we could have Process(workflow), templates, code, use cases, Data etc
- For implementation in R, we are looking to initially use Open source R and Spark and the h2o.ai API
- Make the methodology practical by considering high volume IoT data problems, Project management methodologies for IoT, IoT analytics best practices etc
- Most importantly, we will be Open and Open Sourced
- There are some parallels with this thinking with Big Data Business maturity model index and to Systems thinking – my favourite systems thinking text is An Introduction to General Systems Thinking – Gerald M. Weinberg
- When used in teaching for my course, it has parallels to the Concept-context pedagogy (pdf) where the concepts are tied to the practise in terms of Projects which take center stage
We started off with the question: At which points could you apply analytics to the IoT ecosystem and what are the implications? And extended this to a broader question: Could we formulate a methodology to solve Data Science for IoT problems? The above is comprehensive but not absolute. For example, you can implement deep learning algorithms on mobile devices (Qualcomm snapdragon machine learning development kit for mobile mo…. So, even as I write it, I can think of exceptions!
This article is part of my forthcoming book on Data Science for IoT and also the courses I teach
Welcome your comments. Please email me at ajit.jaokar at futuretext.com – Email me also for a pdf version if you are interested. If you want to be a part of my course please see the testimonials at Data Science for Internet of Things Course.
Finally, I will syndicate more sections of the book on Data Science Central. Stay tuned!