With loads of content and hype about Data Science, Analytics and Big Data coming up every day, I felt compelled to share my journey and experiences in exploring this space. Here, I intend to begin a series of posts on the subject stripped it down to it's core (as I understand it) and than building upon it.

While there are a whole bunch of directions these subjects could be approached from, I present here the path that I found most convenient to tread. I would like to call it the path of the 'Data Shinobi' (another word for the 'Ninja') where you understand and gain expertise in the techniques just enough and relevant to the task at hand unlike a seasoned 'Data Scientist' who has mastered most of these techniques over many years. It must also be noted that when I mention the 'Shinobi' it's in the context of the training that shinobi's underwent rather than their expertise and mastery over skills.

(Image Source: Here)

__To____ ____begin with, here's a list of 5 fundamental facts about Data Science which I discovered early on my journey:__

**#1: Data Cleaning is almost always a pain** - In most cases, you have an amazing approach and models to get your solution up and running. However, in most cases you have to wait for what feels like ages to get your data ready. Usually you could spend about 70-90% of your time reshaping your data to be processed.

**#2: Great visualizations sometimes overshadow amazing models** - 'A picture is worth a 1000 words' still works here. So, building a great model or rebuilding the data architecture to speed up analytics needs to be 1000 times more amazing if it isn't backed up by a stunning visualization. I believe visualizations need more effort from the other half of the brain and sometimes people who have no idea about quantitative modeling could suggest or build really beautiful visualizations that make you think "It makes perfect when you look at it like that.."

**#3: There are 2 fundamentally different approaches in Data Science -**** **One, and probably the older, is **Statistical Learning** that has evolved from traditional statistics and works on error minimization through generalization techniques. The other is **Machine Learning** which is probably the way a Computer scientist would approach a problem. Machine learning techniques utilize error minimization by recursively training the 'machine' to make better predictions. The line between the two is getting thinner as they seem to borrow some concepts from each other time-to-time.

**#4: Not all problems can be boiled down to the textbook techniques anymore -**** **Traditional buckets under which techniques or approaches were classified were - Regression, Classification, Clustering and so on...(this is almost always the 1st chapter in most books on Data Science). However, some techniques such as LDA for Text Analytics might lie outside these buckets. Similarly SVD and ALS used for recommenders are quite different techniques.

**#5: There are always the unknowns from new fields -**** **Even though you master some techniques and have a good grip on R/ SAS, there are always new challenges you face which makes Data Science interesting. Additionally, there might be a complete field where a domain expert might know a lot more than you do. For Example, Supply Chain experts understand a lot about optimization and forecasting techniques. It's hard for a Data Scientist without Supply Chain techniques knowledge to apply Stat learning techniques to problems in this field. In the end continuous improvement (or 'Kaizen') is what would work well here as well.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central