Having looked at the fundamentals in the first blog, the natural next step is to understand the various types of strategies to "attack" the data and make it reveal useful information. However, there is one step we must take just before that: Understand the "enemy" i.e. the problem at hand and the data available.
The Tree of the Data Shinobi:
The tree below is an attempt at categorizing the most commonly occurring problems and list commonly used techniques under each. While not all problems would fall under each of these branches, this is intended to be a good starting point for understanding the nature of the "enemy" and then developing a suitable strategy.
TS - Time Series
AI/ANN - Artificial Intelligence/ Artificial Neural Networks
The 'Nature of Data' might also sometimes be a factor with choosing the technique to attach and is therefore added with a dotted line in the tree.
There are two important points related to the tree diagram above:
1. Since the area covered by the tree above is vast, some of the techniques common to multiple branches might be listed only under one. However, the purpose of the tree is to guide at a high-level on the techniques rather than classifying all the techniques with great accuracy.
2. A lot of new techniques might not yet be added to the tree and I plan to upgrade it as we go along and shoots emerge for new leaves or even branches to be added.