Many thanks for the retweets and feedback on Part one of this blog

A methodology for solving problems with DataScience for Internet of...

Here is Part Two

Here we extend the discussion and also suggest a practical (and open) way to create a way forward

To recap, lets keep in mind the big picture and after considering Streaming in the previous section, let us consider more techniques like Edge Processing etc

Many vendors like Cisco and Intel are proponents of Edge Processing (also called Edge computing). The main idea behind Edge Computing is to push processing away from the core and towards the Edge of the network. For IoT, that means pushing processing towards the sensors or a gateway. This enables data to be initially processed at the Edge device possibly enabling smaller datasets sent to the core. Devices at the Edge may not be continuously connected to the network. Hence, these devices may need a copy of the master data/reference data for processing in an offline format. Edge devices may also include other features like:

• Apply rules and workflow against that data

• Take action as needed

• Filter and cleanse the data

• Store local data for local use

• Enhance security

• Provide governance admin controls

The concept of a Data Lake is similar to that of a Data warehouse or a Data Mart. In this context, we see a Data Lake as a repository for data from different IoT sources. A Data Lake is driven by the Hadoop platform. This means, Data in a Data lake is preserved in its raw format. Unlike a Data Warehouse, Data in a Data Lake is not pre-categorised. From an analytics perspective, Data Lakes are relevant in the following ways:

- We could monitor the stream of data arriving in the lake for specific events or could co-relate different streams. Both of these tasks use Complex event processing (CEP). CEP could also apply to Data when it is stored in the lake to extract broad, historical perspectives.
- Similarly, Deep learning and other techniques could be applied to IoT datasets in the Data Lake when the Data is ‘at rest’. We describe these below.

Companies like Pentaho are applying ETL techniques to IoT data

Some deep learning techniques could apply to IoT datasets. If you consider images and video as sensor data, then we could apply various convolutional neural network techniques to this data.

It gets more interesting when we consider RNNs(Recurrent Neural Networks) and Reinforcement learning. For example – Reinforcement learning and time series – Brandon Rohrer How to turn your house robot into a robot – Answering the challenge – a new reinforcement learning robot

Over time, we will see far more complex options – for example for Self driving cars and the use of Recurrent neural networks (mobile...

Some more interesting links for Deep Learning and IoT:

- http://stats.stackexchange.com/questions/8000/proper-way-of-using-r...
- Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference –https://clgiles.ist.psu.edu/papers/MLJ-finance.pdf
- https://www.quora.com/Can-recurrent-neural-networks-with-LSTM-be-us...
- Time series forecasting with recurrent neural networks http://www.neural-forecasting-competition.com/downloads/NN3/methods...
- and an article by Sibhanjan Das and me on Deep Learning and IoT with H2O @kdnuggets

Systems level optimization and process level optimization for IoT is another complex area where we are doing work. Some links for this

- http://blog.tsia.com/blog/how-iot-process-optimization-can-improve-...
- http://www.intel.co.uk/content/dam/www/public/us/en/documents/white...

Visualization is necessary for analytics in general and IoT analytics is no exception

Here are some links

NoSQL databases today offer a great way to implement IoT analytics. For instance,

In this section, I list some IoT technologies where we could implement analytics

- spatial analytics for IoT – ex from ESRI
- Implementation of IoT analytics on DSPs
- Augmented reality – ex in conjunction with Pokemon Go
- GPUs – there is some extensive work done by Nvidia in this space especially in relation to Deep learning
- I am also following Apache NiFi with great interest with features like Birectional flow of data, perimeter of control changes and data provenance

We started off with the question: Which points could you apply analytics to the IoT ecosystem and what are the implications? But behind this work is a broader question: ** Could we formulate a methodology to solve Data Science for IoT problems?** I am exploring this question as part of my teaching both online and at Oxford University along with Jean-Jacques Bernard.

Here is more on our thinking:

- CRISP-DM is a Data mining process methodology used in analytics. More on CRISP-DM HERE and HERE(pdf documents).
- From a business perspective (top down),we can extend CRISP-DM to incorporate the understanding of the IoT domain i.e. add domain specific features. This includes understanding the business impact, handling high volumes of IoT data, understanding the nature of Data coming from various IoT devices etc
- From an implementation perspective(bottom up), once we have an understanding of the Data and the business processes, for each IoT vertical : We first find the analytics (what is being measured, optimized etc). Then find the data needed for those analytics. Then we provide examples of that implementation using code.
**Extending CRISP-DM to an implementation methodology**, we could have Process(workflow), templates, code, use cases, Data etc - For implementation in R, we are looking to initially use Open source R and Spark and the h2o.ai API
- Make the methodology practical by considering
**high volume IoT data problems, Project management methodologies for IoT, IoT analytics best practices**etc - Most importantly, we will be
**Open and Open Sourced** - There are some
**parallels**with this thinking with Big Data Business maturity model index and to Systems thinking – my favourite systems thinking text is An Introduction to General Systems Thinking – Gerald M. Weinberg - When used in teaching for my course, it has parallels to the Concept-context pedagogy (pdf) where the concepts are tied to the practise in terms of Projects which take center stage

We started off with the question: At which points could you apply analytics to the IoT ecosystem and what are the implications? And extended this to a broader question: Could we formulate a methodology to solve Data Science for IoT problems? The above is comprehensive but not absolute. For example, you can implement deep learning algorithms on mobile devices (Qualcomm snapdragon machine learning development kit for mobile mo.... So, even as I write it, I can think of exceptions!

This article is part of my forthcoming book on Data Science for IoT and also the courses I teach

**Welcome your comments. Please email me at ajit.jaokar at futuretext.com - Email me also for a pdf version if you are interested. **If you want to be a part of my course please see the testimonials at Data Science for Internet of Things Course.

Finally, I will syndicate more sections of the book on Data Science Central. Stay tuned!

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Optimization and The NFL’s Toughest Scheduling Problem - June 23

At first glance, the NFL’s scheduling problem seems simple: 5 people have 12 weeks to schedule 256 games over the course of a 17-week season. The scenarios are potentially well into the quadrillions. In this latest Data Science Central webinar, you will learn how the NFL began using Gurobi’s mathematical optimization solver to tackle this complex scheduling problem. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Optimization and The NFL’s Toughest Scheduling Problem - June 23

At first glance, the NFL’s scheduling problem seems simple: 5 people have 12 weeks to schedule 256 games over the course of a 17-week season. The scenarios are potentially well into the quadrillions. In this latest Data Science Central webinar, you will learn how the NFL began using Gurobi’s mathematical optimization solver to tackle this complex scheduling problem. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central